2 Troubleshooting and Diagnostics
This section includes information about troubleshooting hardware component faults for the Oracle Server X9-2L. It contains the following topics:
For more information about server troubleshooting and dianostics, refer to the Oracle x86 Servers Diagnostics and Troubleshooting Guide at Oracle x86 Servers Administration, Diagnostics, and Applications Documentation.
Troubleshooting Server Component Hardware Faults
This section contains maintenance-related information and procedures that you can use to troubleshoot and repair server hardware issues. The following topics are covered.
Troubleshooting Server Hardware Faults
When a server hardware fault event occurs, the system lights the Fault-Service Required LED and captures the event in the Oracle ILOM event log. If you set up notifications through Oracle ILOM, you also receive an alert through the notification method you choose. When you become aware of a hardware fault, address it immediately.
To investigate a hardware fault, see the following:
Basic Troubleshooting Process
Use the following process to address a hardware fault (for the step-by-step procedure, see Troubleshoot Hardware Faults Using the Oracle ILOM Web Interface).
-
Identify the server subsystem containing the fault.
You can use Oracle ILOM to identify the failed component.
-
Review the Oracle Server X9-2L Product Notes at Oracle Server X9-2L Documentation.
The product notes contain up-to-date information about the server, including hardware-related issues.
-
Prepare the server for service using Oracle ILOM.
If you determined that the hardware fault requires service (physical access to the server), use Oracle ILOM to take the server offline, activate the Locate button/LED, and if necessary, power off the server.
-
Prepare the service workspace.
Before servicing the server, prepare the workspace, ensuring Electrostatic Discharge Safety (ESD) protection for the server and components.
-
Service the components.
To service the components, see the removal, installation, and replacement procedures in this document.
Note:
A component designated as a field-replaceable unit (FRU) must be replaced by Oracle Service personnel. Contact Oracle Service. -
Clear the fault in Oracle ILOM.
Depending on the component, you might need to clear the fault in Oracle ILOM. Generally, components that have a FRU ID, clear the fault automatically.
Troubleshoot Hardware Faults Using the Oracle ILOM Web Interface
Note:
The screens shown in this procedure might differ from those for your server.This procedure uses the basic troubleshooting steps described in Basic Troubleshooting Process.
Use this procedure to troubleshoot hardware faults using the Oracle ILOM web interface and, if necessary, prepare the server for service.
Note:
This procedure provides one basic approach to troubleshooting hardware faults. It uses the Oracle ILOM web interface. However, you can perform the procedure using the Oracle ILOM command-line interface (CLI). For more information about the Oracle ILOM web interface and CLI, refer to the Oracle ILOM documentation.Troubleshooting and Diagnostic Information
The following list displays diagnostic and troubleshooting-related procedures and references that can assist you with resolving server issues.
-
Oracle x86 Servers Diagnostics and Troubleshooting Guide at Oracle x86 Servers Administration, Diagnostics, and Applications Documentation
-
Oracle X9 Series Servers Administration Guide at Oracle x86 Servers Administration, Diagnostics, and Applications Documentation
-
Troubleshooting Using the Server Front and Back Panel Status Indicators
-
Managing Server Hardware Faults Through the Oracle ILOM Fault Management Shell
Troubleshooting Using the Server Front and Back Panel Status Indicators
These sections describe the status indicators (LEDs) located on the front and back of the server, including those found on components and ports. This section includes the following topics:
Server Boot Process and Normal Operating State Indicators
A normal server boot process involves two indicators, the service processor SP OK LED indicator and the System OK LED indicator.
When AC power is connected to the server, the server boots into standby power mode:
-
The SP OK LED blinks slowly (0.5 seconds on, 0.5 seconds off) while the SP is starting, and the System OK LED remains off until the SP is ready.
-
After a few minutes, the main System OK LED slowly flashes the standby blink pattern (0.1 seconds on, 2.9 seconds off), indicating that the SP (and Oracle ILOM) is ready for use. In Standby power mode, the server is not initialized or fully powered on at this point.
When powering on the server (either by the On/Standby button or Oracle ILOM), the server boots to full power mode:
-
The System OK LED blinks slowly (0.5 seconds on, 0.5 seconds off), and the SP OK LED remains lit (no blinking).
-
When the server successfully boots, the System OK LED remains lit. When the System OK LED and the SP OK LED indicators remain lit, the server is in Main power mode.
Note:
The green System OK LED indicator and the green SP OK indicator remain lit (no blinking) when the server is in a normal operating state.Server System-Level Status Indicators
There are seven system-level status indicators (LEDs), some of which are located on both the server front panel and the back panel. For the location of the status indicators, see Front and Back Panel Components. The following table describes these indicators.
Server Fan Status Indicators
Each fan module has one status indicator (LED). The LEDs are located on the chassis fan tray adjacent to and aligned with the fan modules, and are visible when the server top cover is removed.
Status Indicator Name | Icon | Color | State and Meaning |
---|---|---|---|
Fan Status |
![]() |
Amber |
|
Storage Drive Status Indicators
There are three status indicators (LEDs) on each drive.
Status Indicator Name | Icon | Color | State and Meaning |
---|---|---|---|
OK/Activity |
![]() |
Green |
|
Fault-Service Required |
![]() |
Amber |
|
OK to Remove |
![]() |
Blue |
|
Power Supply Status Indicators
There are two status indicators (LEDs) on each power supply. These indicators are visible from the back of the server.
Status Indicator Name | Icon | Color | State and Meaning |
---|---|---|---|
AC OK/ DC OK |
![]() |
Green |
|
Fault-Service Required |
![]() |
Amber |
|
Network Management Port Status Indicators
The server has one 100/1000BASE-T Ethernet management domain interface, labeled NET MGT. There are two status indicators (LEDs) on this port. These indicators are visible from the back of the server.
Status Indicator Name | Location | Color | State and Meaning |
---|---|---|---|
Activity |
Top left |
Green |
|
Link speed |
Top right |
Green |
|
Ethernet Port Status Indicators
The server has one 100/1000BASE-T Gigabit Ethernet port (NET 0). There are two status indicators (LEDs) that are visible from the back of the server.
Status Indicator Name | Location | Color | State and Meaning |
---|---|---|---|
Activity |
Bottom left |
Green |
|
Link speed |
Bottom right |
Bi-colored: Amber/Green |
|
Motherboard Status Indicators
The motherboard contains the following status indicators (LEDs).
Status Indicator | Description |
---|---|
DIMM Fault Status Indicators |
|
Processor Fault Status Indicators |
|
Fault Remind Status Indicator |
|
STBY PWRGD Status Indicator |
|
Troubleshooting System Cooling Issues
Maintaining the proper internal operating temperature of the server is crucial to the health of the server. To prevent server shutdown and damage to components, you need to address overtemperature and hardware-related issues as soon as they occur. If your server has a temperature-related fault, use the information in the following table to troubleshoot the issue.
Cooling Issue | Description | Action | Prevention |
---|---|---|---|
External Ambient Temperature Too High |
The server fans pull cool air into the server from its external environment. If the ambient temperature is too high, the internal temperature of the server and its components increases. This can cause poor performance and component failure. |
Verify the ambient temperature of the server space against the environmental specifications for the server. If the temperature is not within the required operating range, remedy the situation immediately. |
Periodically verify the ambient temperature of the server space to ensure that it is within the required range, especially if you made any changes to the server space (for example, added additional servers). The temperature must be consistent and stable. |
Airflow Blockage |
The server cooling system uses fans to pull cool air in from the server front intake vents and exhaust warm air out the server back panel vents. If the front or back vents are blocked, the airflow through the server is disrupted and the cooling system fails to function properly causing the server internal temperature to rise. |
Inspect the server front and back panel vents for blockage from dust or debris. Inspect the server interior for improperly installed components or cables that can block the flow of air through the server. |
Periodically inspect and clean the server vents using an ESD certified vacuum cleaner. Ensure that all components, such as cards, cables, fans, air baffles and dividers are properly installed. Never operate the server without the top cover installed. |
Cooling Areas Compromised |
The air baffle, component filler panels, and server top cover maintain and direct the flow of cool air through the server. These server components must be in place for the server to function as a sealed system. If these components are not installed correctly, the airflow inside the server can become chaotic and non-directional, which can cause server components to overheat and fail. |
Inspect the server interior to ensure that the air baffle is properly installed. Ensure that all external-facing slots (storage drive, PCIe) are occupied with either a component or a component filler panel. Ensure that the server top cover is in place and sits flat and snug on top of the server. |
When servicing the server, ensure that the air baffle is installed correctly and that the server has no unoccupied external-facing slots. Never operate the server without the top cover installed. |
Hardware Component Failure |
|
Investigate the cause of the overtemperature event, and replace failed components immediately. See Troubleshooting Server Hardware Faults. |
Component redundancy is provided to allow for component failure in critical subsystems, such as the cooling subsystem. However, once a component in a redundant system fails, the redundancy no longer exists, and the risk for server shutdown and component failures increases. Therefore, it is important to maintain redundant systems and replace failed components immediately. |
Troubleshooting Power Issues
If your server does not power on, use the information in the following table to troubleshoot the issue.
Power Issue | Description | Action | Prevention |
---|---|---|---|
AC Power Connection |
The AC power cords are the direct connection between the server power supplies and the power sources. The server power supplies need separate stable AC circuits. Insufficient voltage levels or fluctuations in power can cause server power problems. The power supplies operate at a particular voltage and within an acceptable range of voltage fluctuations. Refer to Electrical Requirements in Oracle Servers X9-2 and X9-2L Installation Guide. |
|
Use the AC power cord Velcro retaining clips and position the cords to minimize the risk of accidental disconnection. Ensure that the AC circuits that supply power to the server are stable and not overburdened. |
Power Supplies (PSUs) |
The server power supply units (PSUs) provide the necessary server voltages from the AC power outlets. If the power supplies are inoperable, unplugged, or disengaged from the internal connectors, the server cannot power on. Note: Use the Velcro straps on the back of the server to secure the power cord connectors to the back of the power supplies. The Velcro retaining straps minimize the risk of accidental disconnection. |
|
When a power supply fails, replace it immediately. To ensure redundancy, the server has two power supplies. This redundant configuration prevents server downtime, or an unexpected shutdown, due to a failed power supply. The redundancy allows the server to continue to operate if one of the power supplies fails. However, when a server is being powered by a single power supply, the redundancy no longer exists, and the risk for downtime or an unexpected shutdown increases. When installing a power supply, ensure that it is fully seated and engaged with its connector inside the drive bay. A properly installed power supply has a lit green AC OK indicator. |
Top Cover |
The server top cover maintains the air pressures inside the server, prevents accidental exposure to hazardous voltages, and protects internal components from physical and environmental damage. |
Do not operate the server without the top cover installed unless you are hot-plugging a fan module, and then ensure that you complete the operation and replace the cover within 60 seconds. See Servicing Fan Modules (CRU) and Install the Server Top Cover. |
Be careful to avoid bending or otherwise warping the top cover. |
Managing Server Hardware Faults Through the Oracle ILOM Fault Management Shell
The Oracle ILOM Fault Management Shell enables you to view and manage fault activity on managed servers and other types of devices.
For more information about how to use the Oracle ILOM Fault Management Shell, refer to the Oracle ILOM User's Guide for System Monitoring and Diagnostics Firmware Release 5.0.x at Oracle Integrated Lights Out Manager (ILOM) 5.0 Documentation.
Troubleshooting With Diagnostic Tools
The server and its accompanying software and firmware contain diagnostic tools and features that can help you isolate component problems, monitor the status of a functioning system, and exercise one or more subsystem to disclose more subtle or intermittent hardware-related problems.
Each diagnostic tool has its own specific strength and application. Review the tools listed in this section and determine which tool might be best to use for your situation. After you determine the tool to use, you can access it locally, while at the server, or remotely.
Diagnostic Tools
The selection of diagnostic tools available for your server range in complexity from a comprehensive validation test suite (Oracle VTS) to a chronological event log (Oracle ILOM event Log). The selection of diagnostic tools also includes standalone software packages, firmware-based tests, and hardware-based LED indicators.
The following table summarizes the diagnostic tools that you can use when troubleshooting or monitoring your server.
Diagnostic Tool | Type | What It Does | Accessibility | Remote Capability |
---|---|---|---|---|
Oracle ILOM |
SP firmware |
Monitors environmental condition and component functionality sensors, generates alerts, performs fault isolation, and provides remote access. |
Can function in either Standby power mode or Main power mode and is not OS dependent. |
Remote and local access. |
Hardware-based LED indicators |
Hardware and SP firmware |
Indicates status of overall system and particular components. |
Available when system power is available. |
Local, but sensor and indicators are accessible from Oracle ILOM web interface or command-line interface (CLI). |
Power-On Self-Test (POST) |
Host firmware |
Tests core components of system: CPUs, memory, and motherboard I/O bridge integrated circuits. |
Runs on startup. Available when the operating system is not running. |
Local, but can be accessed through Oracle ILOM Remote System Console Plus. |
UEFI Diagnostics |
SP firmware |
Tests and detects problems on all processors, memory, disk drives, and network ports. |
Use either the Oracle ILOM web interface or the command-line interface (CLI) to run UEFI diagnostics. |
Remote access through Oracle ILOM Remote System Console Plus. |
Oracle ILOM SP/Diag shell |
SP firmware |
Allows you to run HWdiag commands to check the status of a system and its components, and access HWdiag logs. |
Can function on Standby power and when operating system is not running. |
Local, but remote serial access is possible if the SP serial port is connected to a network-accessible terminal server. |
Oracle Solaris commands |
Operating system software |
Displays various kinds of system information. |
Requires operating system. |
Local, and over network. |
Oracle Linux commands |
Operating system software |
Displays various kinds of system information. |
Requires operating system. |
Local, and over network. |
Oracle VTS |
Diagnostic tool standalone software |
Exercises and stresses the system, running tests in parallel. |
Requires the Solaris operating system. Install Oracle VTS software separately. |
View and control over network. |
Diagnostic Tool Documentation
The following table identifies where you can find documentation for more information about diagnostic tools.
Diagnostic Tool | Documentation | Location |
---|---|---|
Oracle ILOM |
Oracle Integrated Lights Out Manager 5.0 Documentation Library |
Oracle Integrated Lights Out Manager (ILOM) 5.0 Documentation |
UEFI Diagnostics or HWdiag |
Oracle x86 Servers Diagnostics and Troubleshooting Guide |
Oracle x86 Servers Administration, Diagnostics, and Applications Documentation |
System indicators and sensors |
This document |
Troubleshooting Using the Server Front and Back Panel Status Indicators |
Oracle VTS |
Oracle VTS software and documentation |
Attaching Devices to the Server
The following sections contain procedures for attaching devices to the server so you can access diagnostic tools when troubleshooting and servicing the server:
Attach Devices to the Server
This procedure explains how to connect devices to the server (remotely and locally), so that you can interact with the service processor (SP) and the server console.
Back Panel Connector Locations
The following illustration shows and describes the locations of the back panel connectors. Use this information to set up the server, so that you can access diagnostic tools and manage the server during service.

Callout | Cable Port or Expansion Slot | Description |
---|---|---|
1 |
Power supply 0 input power Power supply 1 input power |
The server has two power supply connectors, one for each power supply. Do not attach power cables to the power supplies until you finish connecting the data cables to the server. The server goes into Standby power mode, and the Oracle ILOM service processor initializes when the AC power cables are connected to the power source. System messages might be lost after 60 seconds if the server is not connected to a terminal, PC, or workstation. Note: Oracle ILOM signals a fault on any installed power supply that is not connected to an AC power source, as it might indicate a loss of redundancy. |
2 |
Network management port (NET MGT) |
The service processor NET MGT port is the optional connection to the Oracle ILOM service processor. The NET MGT port is configured by default to use Dynamic Host Configuration Protocol (DHCP). The service processor NET MGT port uses an RJ-45 cable for a 100/1000BASE-T connection. |
3 |
Ethernet port (NET 0) |
The Ethernet port enables you to connect the system to the network. The Ethernet port uses an RJ-45 cable for a 100/1000BASE-T connection. |
4 |
USB port |
The USB port supports hot-plugging. You can connect and disconnect a USB cable or a peripheral device while the server is running without affecting system operations. |
5 |
Serial management port (SER MGT) |
The service processor SER MGT port uses an RJ-45 cable and terminal (or emulator) to provide access to the Oracle ILOM command-line interface (CLI). Using Oracle ILOM, you can configure it to connect to the system console. Note: The serial management port does not support network connections. |
Configuring Serial Port Sharing
By default, the service processor (SP) controls the serial management (SER MGT) port and uses it to redirect the host serial console output. Using Oracle ILOM, you can assign the host console (COM1) as owner of the SER MGT port output, which allows the host console to output information directly to the SER MGT port. Serial port sharing is useful for Windows kernel debugging, because you can view non-ASCII character traffic output from the host console.
Set up the network on the SP before attempting to change the serial port owner to the host server. If the network is not set up first, and you switch the serial port owner to the host server, you cannot connect using the CLI or web interface to change the serial port owner back to the SP. To return the serial port owner setting to the SP, restore access to the serial port on the server. For details, refer to the Oracle Integrated Lights Out Manager (ILOM) 5.0 Documentation Library at Oracle Integrated Lights Out Manager (ILOM) 5.0 Documentation.
If you accidentally lose access to Oracle ILOM, contact Oracle Service and follow the process to return the serial port ownership back to the SP.
You can assign serial port output using either the Oracle ILOM CLI interface or web interface, as described in the following sections:
Server Operating System Names for the NVMe Storage Drives
If NVMe storage drives are installed in the server front panel, they are labeled NVMe0 through NVMe11. The server operating systems assign these storage drives different names. For the corresponding names assigned by the operating systems, see the following table. The drive names provided in the table assume that:
-
Oracle Retimer card is installed in PCIe slot 10 and the cabling to the disk backplane is correct
-
NVMe cabling between the motherboard and the disk backplane is correct
Storage Drive Labels | Names Assigned by the Server Operating Systems |
---|---|
NVMe0 |
PCIe Slot 100 |
NVMe1 |
PCIe Slot 101 |
NVMe2 |
PCIe Slot 102 |
NVMe3 |
PCIe Slot 103 |
NVMe4 |
PCIe Slot 104 |
NVMe5 |
PCIe Slot 105 |
NVMe6 |
PCIe Slot 106 |
NVMe7 |
PCIe Slot 107 |
NVMe8 |
PCIe Slot 108 |
NVMe9 |
PCIe Slot 109 |
NVMe10 |
PCIe Slot 110 |
NVMe11 |
PCIe Slot 111 |
Ethernet Device Naming
This section contains information about the device naming for the one 10-Gigabit Ethernet port (labeled NET 0) on the back panel of the server. See Back Panel Connector Locations.
Ethernet Port Device Naming
The device naming for the Ethernet interface is reported differently by different interfaces and operating systems. The following table shows the BIOS (physical) and operating system (logical) naming convention for the interface. The device naming convention might vary, depending on the conventions of your operating system and which devices are installed in the server.
Note:
Naming used by the interfaces might be different from the names in the following table, depending on which devices are installed in the system.Port | Oracle Solaris | Oracle Linux 7 and 8 | Windows (example default name, see note below) |
---|---|---|---|
Net 0 |
igb0 |
eno1 |
Ethernet |
Note:
For Windows, a port name such as Ethernet is used by default. However, actual port naming is based on the order of enumeration, typically during operating system installation. Additionally, Windows allows you to rename the ports to meet application-specific needs.MAC Address Mapping to Ethernet Ports
A system serial label that displays the MAC ID (and the associated barcode) for the server is attached to the top, front left side of the Oracle Server X9-2L server disk cage bezel.
This MAC ID (and barcode) corresponds to a hexadecimal (base 16) MAC address for a sequence of six consecutive MAC addresses. These six MAC addresses correspond to the server network ports, as shown in the following table.
Base MAC Address | Corresponding Ethernet Port |
---|---|
“base” + 0 |
NET 0 |
“base” + 1 |
Unassigned |
“base” + 2 |
Unassigned |
“base” + 3 |
Unassigned |
“base” + 4 |
SP (NET MGT) |
“base” + 5 |
Used only when Network Controller-Sideband Interface (NC-SI) sideband management is configured. |
Getting Help
The following sections describe how to get additional help to resolve server-related problems.
Contacting Support
If the troubleshooting procedures in this chapter do not solve your problem, use the following table to collect information that you might need to communicate to Oracle Support.
System Configuration Information Needed | Your Information |
---|---|
Service contract number |
|
System model |
|
Operating environment |
|
System serial number |
|
Peripherals attached to the system |
|
Email address and phone number for you and a secondary contact |
|
Street address where the system is located |
|
Superuser password |
|
Summary of the problem and the work being done when the problem occurred |
|
Other Useful Information |
|
IP address |
|
Server name (system host name) |
|
Network or internet domain name |
|
Proxy server configuration |
Locating the Chassis Serial Number
You might need your server serial number when you ask for service on your system. Record this number for future use. Use one of the following resources or methods to locate your server serial number.
-
The serial number is located on the Radio-frequency Identification (RFID) label on the bottom left side of the front panel bezel, below the general status LEDs.
For illustrations of the server front panel, see Front and Back Panel Components.
-
The serial number is recorded on a label that is attached to the top front surface of the system.
-
The serial number is recorded on the yellow Customer Information Sheet (CIS) that is attached to your server packaging.
-
Using Oracle ILOM:
-
From the web interface, view the serial number on the System Information screen.
-
From the command-line interface (CLI), type the command:
show /System
-
Auto Service Requests
Oracle Auto Service Requests (ASR) is available at no additional cost to customers with Oracle Premier Support. Oracle ASR is the fastest way to restore system availability if a hardware fault occurs. Oracle ASR software is secure and customer installable, with the software and documentation downloadable at My Oracle Support. When you log in to My Oracle Support, refer to the "Oracle Auto Service Request" knowledge article document (ID 1185493.1) for instructions on downloading the Oracle ASR software.
When a hardware fault is detected, Oracle ASR opens a service request with Oracle and transfers electronic fault telemetry data to help expedite the diagnostic process. Oracle diagnostics analyze the telemetry data for known issues and delivers immediate corrective actions. For security, the electronic diagnostic data sent to Oracle includes only what is needed to solve the problem. The software does not use any incoming Internet connections and does not include any remote access mechanisms.
For more information about Oracle ASR, go to Oracle Premier Support.