Troubleshoot Hardware Faults Using Oracle ILOM CLI

This procedure uses the basic troubleshooting steps described in Basic Troubleshooting Process.

Use this procedure to troubleshoot hardware faults using the Oracle ILOM command-line interface (CLI) and, if necessary, prepare the server for service.

  1. Open a terminal and using a secure method, such as a secure shell, log into the SP using the user name (with administrator privileges) and SP IP address or hostname. For example:

    ssh username@hostname

  2. When prompted, enter the password.
  3. At the Oracle ILOM prompt (->), enter the command to show any faults. For example:
    -> show faulty
    Target | Property | Value
    -------------------------+------------------------------------+-------------------
    /SP/faultmgmt/0 | fru | /SYS/MB/P0
    /SP/faultmgmt/0/faults/0 | class | fault.cpu.cache.uncorrectable.error
                                  

    In the above example, the displayed fault shows that Processor 0 encountered an uncorrectable cache error.

  4. To get more information, enter the command to view Open Problems:
    -> show System/Open_Problems
    
    Open Problems (1)
    Date/Time                 Subsystems          Component
    ------------------------  ------------------  ------------
    Wed May 16 18:00:39 2023  Processor, Last Level Cache, P0(CPU 0)
            A non-recoverable cache failure was detected by the device while
            performing a command. (Probability:100,
            UUID:f9c9d6d6-5c42-6f7d-c2c0-857962de2ce5,
            Resource:/SYS/MB/P0, Part Number:N/A, Serial Number:N/A,
            Reference Document:http://support.oracle.com/msg/ISTOR-1234-5H)

    The Open Problems listing provides detailed information, such as the time the event occurred, the component and subsystem name, and a description of the issue. It also includes a link to an Oracle KnowledgeBase article that includes possible problem resolution steps.

    Tip:

    The System Log provides a chronological list of all the system events and faults that have occurred since the log was last reset and includes additional information, such as severity levels and error counts. To access the System Log, type: System/Log
  5. Before accessing the physical server, review Known Issues for information related to the issue or the component.

    The Oracle AMD-Based Cloud Server Product Notes contain up-to-date information about the server, including hardware-related issues. In addition to checking the product notes, the customer should follow the link to the Oracle KnowledgeBase article.

  6. To prepare the server for service, see Preparing for Service.
  7. Service the component.

    After servicing the component, you might need to clear the fault in Oracle ILOM. For more information, refer the service procedures for the component. See Monitoring Component Health and Faults Using Oracle ILOM and Oracle ILOM Documentation.