Software Worker Threads Watchdog Timer and Health Check Trap

The Oracle Communications Session Border Controller monitors specific software threads for faults and provides the user with configurable actions to take in case of thread failure. The system registers applicable threads to this watchdog and assumes a thread has failed when it does not respond. By default, the Oracle Communications Session Border Controller generates information about the event and reboot history. For HA configurations, the system synchronizes this watchdog configuration and simultaneously operates on both the active and standby Oracle Communications Session Border Controllers.

You can query the system to show the actual threads being monitored with the show platform health-check command. The output include these columns:

Name: name of the thread that registered with HealthCheck
Count: Health Count of the thread
State: State of thread as either: STOPPED, RUNNING, EXCLUDE
Duration: Stop Expire time in seconds. Shows 0 for RUNNING and EXCLUDE states.

ORACLE# show platform health-check
------------------------------------------------
Name Count STATE DURATION
------------------------------------------------
tLrtd 3 RUNNING 0
lrtdWorkerThrea 3 RUNNING 0
dnsWorker01 3 RUNNING 0
loseld 3 RUNNING 0
npsoft 3 RUNNING 0
tFlowGdTmr 3 RUNNING 0
tLemd 3 RUNNING 0
tServiceHealth 3 RUNNING 0
tAtcpd 3 RUNNING 0
atcpd02 0 EXCLUDE 0
atcpd01 0 EXCLUDE 0
[...]
------------------------------------------------
Total Displayed: 39
-----------------------------------------------

When an applicable thread is not responding, the Oracle Communications Session Border Controller's default behavior includes:

Generate a log message
Issue an alarm
Issue a SNMP trap
Generate a core dump
Reboot

The user configures the Software Worker Threads Watchdog action by configuring the sw-health-check-action option in the system-config with one of the following values:

logonly — Generate log message only
logandreboot — Generate log message and reboot
logcoreandreboot — Generate log message, generate a core dump and reboot [default]

By default, the system checks thread status every 16 seconds. The user can change this interval with the task-health-check-time option configured in the system-config.

When the system identifies an unresponsive thread, it sends out the following trap: apUsbcSysThreadNotRespondingTrap. This trap is defined within the apUsbc MIB. The system sends it once by default; this value can be overridden by the trap configuration. This function does not include a clear trap.

Be aware that the tHealthCheckd task monitors only the application tasks that are registered with it. It does not monitor any platform tasks.

None of the configuration options are real-time configurable; the user must reboot after changing the option.

Software Worker Thread Health Check Interval Configuration

Use this procedure to set the timing and action for the Software Worker Thread Health Check and Watchdog Timer.

Access the system-config configuration element.

ORACLE# configure terminal
ORACLE(configure)# system
ORACLE(system)# system-config
ORACLE(system-config)#

Type select to begin editing the system-config object.

ORACLE(system-config)# select
ORACLE(system-config)#

Set the task-health-check-time option to the preferred interval (in seconds)
```
ORACLE(system-config)# option +task-health-check-time=10
```
Set the watchdog timer action option that indicates the action on thread failure.
```
ORACLE(system-config)# option +sw-health-check-action=logonly
```
Type done to save your configuration.