Software Worker Threads Watchdog Timer and Health Check Trap
The Oracle Communications Session Border Controller monitors specific software threads for faults and provides the user with configurable actions to take in case of thread failure. The system registers applicable threads to this watchdog and assumes a thread has failed when it does not respond. By default, the Oracle Communications Session Border Controller generates information about the event and reboot history. For HA configurations, the system synchronizes this watchdog configuration and simultaneously operates on both the active and standby Oracle Communications Session Border Controllers.
You can query the system to show the actual threads being monitored with the show platform health-check command. The output include these columns:
- Name: name of the thread that registered with HealthCheck
- Count: Health Count of the thread
- State: State of thread as either: STOPPED, RUNNING, EXCLUDE
- Duration: Stop Expire time in seconds. Shows 0 for RUNNING and EXCLUDE states.
ORACLE# show platform health-check ------------------------------------------------ Name Count STATE DURATION ------------------------------------------------ tLrtd 3 RUNNING 0 lrtdWorkerThrea 3 RUNNING 0 dnsWorker01 3 RUNNING 0 loseld 3 RUNNING 0 npsoft 3 RUNNING 0 tFlowGdTmr 3 RUNNING 0 tLemd 3 RUNNING 0 tServiceHealth 3 RUNNING 0 tAtcpd 3 RUNNING 0 atcpd02 0 EXCLUDE 0 atcpd01 0 EXCLUDE 0 [...] ------------------------------------------------ Total Displayed: 39 -----------------------------------------------
When an applicable thread is not responding, the Oracle Communications Session Border Controller's default behavior includes:
- Generate a log message
- Issue an alarm
- Issue a SNMP trap
- Generate a core dump
- Reboot
The user configures the Software Worker Threads Watchdog action by configuring the sw-health-check-action option in the system-config with one of the following values:
- logonly — Generate log message only
- logandreboot — Generate log message and reboot
- logcoreandreboot — Generate log message, generate a core dump and reboot [default]
By default, the system checks thread status every 16 seconds. The user can change this interval with the task-health-check-time option configured in the system-config.
When the system identifies an unresponsive thread, it sends out the following trap: apUsbcSysThreadNotRespondingTrap. This trap is defined within the apUsbc MIB. The system sends it once by default; this value can be overridden by the trap configuration. This function does not include a clear trap.
Be aware that the tHealthCheckd task monitors only the application tasks that are registered with it. It does not monitor any platform tasks.
None of the configuration options are real-time configurable; the user must reboot after changing the option.