Fault Management Terminology
See the following table for common fault management terms and definitions.
Term | Description |
---|---|
Proactive self-healing |
Proactive self-healing is a fault management architecture and methodology for automatically diagnosing, reporting, and handling software and hardware fault conditions. Proactive self-healing reduces the time required to debug a hardware or software problem and provides the system administrator or Oracle Services personnel with detailed data about each fault. The architecture consists of an event management protocol, the Fault Manager, and fault-handling agents and diagnosis engines. |
Diagnosis engines |
The fault management architecture, in Oracle ILOM, includes diagnosis engines that broadcast fault events for detected system errors. For a list of diagnosis engines supported in the fault management architecture for Oracle ILOM, see fmstat Report Example and Description. |
Health states |
Oracle ILOM associates the following health states with every resource for which telemetry information has been received. The possible states presented in Oracle ILOM interfaces include:
|
Fault |
A fault indicates that a hardware component is present but is unusable or degraded because one or more problems have been diagnosed by the Oracle ILOM Fault Manager. The component has been disabled to prevent further damage to the system. |
FRU |
A FRU is a field-replaceable unit (such as a drive, memory DIMM, or printed circuit board). |
CRU |
A CRU is a customer-replaceable unit. |
Universal unique identifier (UUID) |
A UUID is used to uniquely identify a problem across any set of systems. |