About Alarms
This section describes system-level alarms. Alarms play a significant role in determining overall health of the system. For additional information, see the Oracle Communications Session Border Controller MIB Reference Guide.
Overview
An alarm is triggered when a condition or event happens within either the system’s hardware or software. This alarm contains an alarm code, a severity level, a textual description of the event, the time the event occurred, and for high severity alarms, trap information.
The system’s alarm handler processes alarms by locating the alarm ID for a particular alarm condition and then looking up that condition in an alarm table. The alarm table contains all of the actions required for following up on an alarm.
Types of Alarms
The system can generate the following types of alarms:
- Hardware alarms: generated when a problem with the chassis occurs.
- System alarms: accounts for system resource and redundancy issues. For example, CPU utilization is over threshold, memory utilization is high, the health score is under threshold, or a task is suspended. They also include low-level system calls (for example, there is not enough memory available).
- Network alarms: can occur when the software is unable to communicate with the hardware.
- Application alarms: account for application issues (for example, problems that involve protocols). These protocols include:
- SIP
- RADIUS
Application alarms also include security breaches, session failures, and problems related to accounting.
About the Alarm Process
An alarm is triggered when a condition or event happens within either the hardware or software. This alarm contains the following elements:
- Alarm ID: a unique 32-bit integer that contains a 16-bit category name or number and a 16-bit unique identifier for the error or failure within that category.
- Severity: how severe the condition or failure is to the system.
- Character string: a textual description of the event or condition.
- Trap information: is not contained within every alarm, but is only sent for events of greater severity. See the Oracle Communications Session Border Controller MIB Reference Guide for more information.
About Alarms and the Health Score
The Oracle Communications Session Border Controller health score is used to determine the active/standby roles of the Oracle Communications Session Border Controllers participating in a High Availibility pair architecture. The healthiest Oracle Communications Session Border Controller peer (peer with the highest health score) is the active Oracle Communications Session Border Controller peer. The Oracle Communications Session Border Controller peer with the lower health score is the standby Oracle Communications Session Border Controller peer.
The health score is based on a 100-point scoring system. When all system components are functioning properly, the health score of the system is 100.
Alarms play a significant role in determining the health score of an HA Oracle Communications Session Border Controller. Some alarm conditions have a corresponding health value, which is subtracted from the health score of the system when that alarm occurs. When that alarm is cleared or removed, the corresponding health value is added back to the system’s health score.
If a key system task (for example, a process or daemon) fails, the health score of that HA Oracle Communications Session Border Controller might be decremented by 75 points, depending on how the system configuration was configured. These situations, however, do not have a corresponding system alarm.
When an alarm condition is cleared or removed, this action has a positive impact on the health score of a system.
Displaying and Clearing Alarms
You display and clear alarms using the following ACLI commands:
- display-alarms
- clear-alarm
The clear-alarm command is only available in Superuser mode. You must have that level of privilege to clear alarms.
Clearing Alarms
If an alarm situation is corrected, the corresponding alarm is cleared in the system’s alarm table and health is restored. You can also issue an ACLI command to clear a specific alarm:
To clear a specific system alarm:
About the Alarm Display on the Chassis
The alarm display appears in a two-line front panel display mode. During an alarm condition, the alarm display replaces the standard display on the chassis.
The first line of the graphic display shows the number of hardware-related alarms, if any. The second line of the graphic display shows the number of link-related alarms, if any. For example:
1 HW ALARM 2 LINK ALARMS
If the graphic display window indicates an alarm condition, the system administrator must determine the nature of the condition by using the display-alarms ACLI command. Executing this command allows system administrators to view specific details about the alarm.
When an alarm condition is cleared, the standard display replaces the alarm display.
Alarm Severity Levels
Five levels of alarm severity have been established for the system. These levels have been designated so that the system can take action that is appropriate to the situation.
Alarm Severity | Description |
---|---|
Emergency | Requires immediate attention. If you do not attend to this condition immediately, there will be physical, permanent, and irreparable damage to your system. |
Critical | Requires attention as soon as it is noted. If you do not attend to this condition immediately, there may be physical, permanent, and irreparable damage to your system. |
Major | Functionality has been seriously compromised. As a result, this situation might cause loss of functionality, hanging applications, and dropped packets. If you do not attend to this situation, your system will suffer no physical harm, but it will cease to function. |
Minor | Functionality has been impaired to a certain degree. As a result, you might experience compromised functionality. There will be no physical harm to your system. However, you should attend to this type of alarm as soon as possible in order to keep your system operating properly. |
Warning | Some irregularities in performance. This condition describes situations that are noteworthy, however, you should attend to this condition in order to keep your system operating properly. For example, this type of alarm might indicate the system is running low on bandwidth and you may need to contact Oracle to arrange for an upgrade. |
System Response to Alarms
The system is capable of taking any of a range of actions when an alarm event occurs. It can present the alarms in the VED graphic display window on the front panel of the chassis, use the acmelog (syslog) to log the events off the system, create an SNMP trap with an event notification, or use three dry contacts for external alarming.
Within the system, a database holds all information related to what actions to take given an event of a specific category and severity. This section sets out and defines these actions.
Writing to syslog (acmelog)
The term syslog refers to the protocol used for the network logging of system and network events. Because syslog facilitates the transmission of event notification messages across networks, the syslog protocol can be used to allow remote log access.
Sending SNMP Traps
An SNMP trap is essentially an event notification that can be initiated by tasks (such as the notify task), by log messages, or by alarm reporting. When an event occurs, the Oracle Communications Session Border Controller sends a trap to the management station.
Although there is no direct correlation between system alarms and the generation of SNMP traps, there is a correlation between system alarms and the MIBs that support SNMP traps. For a list of the SNMP-related alarms and their associated traps, refer to the Oracle Communications Session Border ControllerMIB Reference Guide.
About Dry Contacts
The system supports three relays at the back of the Oracle Communications Session Border Controller chassis used for transmission of alarms called dry contacts. A dry contact is triggered for the following levels of severity:
- Critical
- Major
- Minor
Most often, the dry contact action is registered in the physical location of the chassis. For example, there may be an LED signal on a communications cabinet.