Converged Application Server Monitoring and Overload Protection
This chapter describes Oracle Communications Converged Application Server monitoring as well as overload protection and how it is configured.
About Monitoring and Overload Protection
Converged Application Server provides two interrelated systems that you can use together to ensure your environments remain within functional boundaries:
-
SIP Server and Application Monitoring Console
-
SIP Overload Protection
The first system, SIP Server and Application Monitoring Console, provides you with a window into the performance of your SIP servers and deployed SIP applications. Using the console, you can review the real time performance of your servers and applications, and spot possible bottlenecks and impending failure conditions.
The second system, SIP Overload Protection, enables you to act upon the data you see in the SIP Server and Application Monitoring Console. Using the SIP Overload Protection interface, you can set flexible traps and thresholds, and statistical algorithms to gracefully handle many types of performance issues before they endanger the health of your environment.
SIP Server and Application Monitoring
Converged Application Server provides a console interface for monitoring your Session Initiation Protocol (SIP) servers and SIP applications.
To access the monitoring interface, do the following:
-
Use your browser to access the URL http://address:port/console where address is the Administration Server's listen address and port is the listen port.
Note:
The default administration console port for Converged Application Server is 7001.
-
Select the SipServer node in the left pane, and select the Monitoring tab in the right pane.
-
In the Monitoring tab, you can select the following subtabs:
-
General: Provides general monitoring data on configured SIP servers.
-
SIP Performance: Provides per server performance information.
-
SIP Applications: Provides performance information on deployed SIP applications.
-
Call State Storage: Provides state and statistics information for SIP call state.
-
The following sections provide details on each monitoring subtab.
General
The General subtab of the Monitoring tab provides a variety of general runtime information on messages and sessions for each configured SIP server. Active SIP and Application sessions are also totaled at the bottom of the pane.
Table 3-2 General Monitoring Data
Datum | Description |
---|---|
Name |
The name of the SIP server instance. |
Start Time |
The time at which the SIP server instance was started. |
Application Session Count |
The number of active SIP application sessions. |
SIP Session Count |
The number of active SIP sessions. |
Destroyed Application Session Count |
The number of destroyed application sessions. |
Destroyed SIP Session Count |
The number of destroyed SIP sessions. |
Messages Received |
The number of SIP messages received. |
Messages Rejected |
The number of rejected SIP messages. |
Messages Processed |
The total number of SIP messages processed. |
Cluster Id |
The Converged Application Server cluster ID. |
The final row of the table provides domain wide totals for all of the data in the table.
SIP Performance
The SIP Performance subtab of the Monitoring tab provides runtime performance statistics over a period of time for each configured SIP server. The period (default 60 seconds) and sample frequency (default 10 seconds) are noted at the bottom of the pane.
Table 3-3 SIP Performance Monitoring Data
Datum | Description |
---|---|
Name |
The name of the SIP server instance. |
SIP Throughput |
The SIP message throughput. |
Succeeded SIP Trans |
The number successful SIP transactions. |
Failed SIP Trans |
The number of failed SIP transactions. |
SIP Applications
The SIP Applications subtab of the Monitoring tab provides runtime session information for SIP applications deployed on each configured SIP server.
Table 3-4 SIP Applications Data
Datum | Description |
---|---|
Engine |
The Converged Application Server engine on which the SIP application is deployed. |
Name |
The name of the SIP application. |
SIP Session Count |
The number of active SIP sessions. |
Application Session Count |
The number of active application sessions. |
Destroyed SIP Session Count |
The number of destroyed SIP sessions. |
Destroyed Application Session Count |
The number of destroyed application sessions. |
Call State Storage
The Call State Storage subtab of the Monitoring tab provides monitoring data in four additional subtabs:
-
Call State Service
-
Call State Cache
-
Call State Metadata Cache
-
Call State Index Cache
The data monitored in each subtab is covered in the following sections.
Call State Service
The Call State Service subtab of the Call State Storage subtab describes state and statistics about the call state Coherence cache service for the entire Converged Application Server domain.
For more details on Coherence statistics and monitoring, see "Introduction to Coherence Management" in Coherence Management Guide.
Table 3-5 Call State Service Monitoring Data
Datum | Description |
---|---|
Server |
This is a static label, Total/Average (domainwide). |
Local Messages |
The umber of messages pending processing. |
Received Messages |
The total number of messages received by the host since the statistics were last reset. |
Sent Messages |
The total number of messages sent by the host since the statistics were last reset. |
Owned Backup Partitions |
The number of partitions that this domain backs up (responsible for the backup storage). |
Owned Primary Partitions |
The number of partitions that this domain owns (responsible for the primary storage). |
Endangered Partitions |
The number of partitions that are not currently backed up. |
Unbalanced Partitions |
The number of primary and backup partitions which remain to be transferred until the partition distribution across the storage enabled service members is fully balanced. |
Vulnerable Partitions |
The number of partitions that are backed up on the same computer where the primary partition owner resides. |
Average Request Duration |
The average duration (in milliseconds) of an individual synchronous request issued by the service since the last time the statistics were reset. |
Max Request Duration |
The maximum duration (in milliseconds) of a synchronous request issued by the service since the last time the statistics were reset. |
Pending Request Duration |
The duration (in milliseconds) of the oldest pending synchronous request issued by the service. |
Average Task Duration |
The average duration (in milliseconds) of an individual task execution. |
Task Backlog |
The size of the backlog queue that holds tasks scheduled to be executed by a service thread. |
Max Task Backlog |
The maximum size of the backlog queue since the last time the statistics were reset. |
Idle Thread Count |
The number of currently idle threads in the service thread pool. |
Call State Cache
The Call State Cache subtab of the Call State Storage subtab describes state and statistics about the call state Coherence cache for the entire Converged Application Server domain.
Table 3-6 Call State Cache Monitoring Data
Datum | Description |
---|---|
Server |
This is a static label, Total/Average (domainwide). |
Entry Count |
The number of entries in the Coherence call state cache. |
Data Size |
The total data size of the Coherence call state cache. |
Call State Metadata Cache
The Call State Metadata Cache subtab of the Call State Storage subtab describes state and statistics about the call state metadata Coherence cache for the entire Converged Application Server domain.
Table 3-7 Call State Metadata Cache Monitoring Data
Datum | Description |
---|---|
Server |
This is a static label, Total/Average (domainwide). |
Entry Count |
The number of entries in the Coherence call state metadata cache. |
Data Size |
The total data size of the Coherence call state metadata cache. |
Call State Index Cache
The Call State Index Cache subtab of the Call State Storage subtab describes state and statistics about the call state index Coherence cache for the entire Converged Application Server domain.
Table 3-8 Call State Index Cache Monitoring Data
Datum | Description |
---|---|
Server |
This is a static label, Total/Average (domainwide). |
Entry Count |
The number of entries in the Coherence call state index cache. |
Data Size |
The total data size of the Coherence call state index cache. |
Other Ways to Monitor Converged Application Server
In addition to using the monitoring functionality in the WebLogic console, you can also monitor Converged Application Server using the WebLogic Scripting Tool (WLST), Java Management Extensions (JMX) as well as the WebLogic Diagnostic Framework (WLDF). The next sections provide additional details.
Monitoring Applications with the WebLogic Scripting Tool
The WebLogic Scripting Tool (WLST) is a command-line scripting environment that you can use to create, manage, and monitor WebLogic domains. It is based on the Java scripting interpreter, Jython. In addition to supporting standard Jython features such as local variables, conditional variables, and flow control statements, WLST provides a set of scripting functions (commands) that are specific to WebLogic Server.
You can use WLST to retrieve information that WebLogic Server instances produce to describe their run-time state. For more information, see "Getting Runtime Information" in Understanding the WebLogic Scripting Tool.
Developing Custom Management Utilities with JMX
To integrate third-party management systems with the WebLogic Server management system, WebLogic Server provides standards-based interfaces that are fully compliant with the Java Management Extensions (JMX) specification. You can use these interfaces to monitor WebLogic Server MBeans, to change the configuration of a WebLogic Server domain, and to monitor the distribution (activation) of those changes to all server instances in the domain.
To get started creating custom JMX management utilities, see Developing Custom Management Utilities Using JMX for Oracle WebLogic Server.
WebLogic Server Diagnostic Framework
The WebLogic Diagnostic Framework (WLDF) consists of a number of components that work together to collect, archive, and access diagnostic information about a WebLogic Server instance and its applications. Converged Application Server version integrates with several components of the WLDF in order to monitor and diagnose the operation of engines, as well as deployed SIP Servlets. For details, see Using the WebLogic Server Diagnostic Framework (WLDF).
About Converged Application Server Overload Protection
Converged Application Server implements an overload framework which supports plug-in statistics collectors, plug-in event handlers, as well as multiple threshold settings and statistics collection algorithms.
About the Overload Protection Framework
Converged Application Server overload protection statistics collectors and event handlers are installed as Statistics Provider Interface (SPI) plug-ins. Only a single instance of each statistics collector and event handler can be instantiated as utility functions in the SPI.
Multiple thresholds can be configured for each statistics collector, and, when activated upon an incoming SIP session, samples are collected at a user-configurable interval, and statistics results are calculated according to a user-configurable algorithm. The results of the statistics calculations are then used to execute particular actions depending upon the comparison of those results with a user-configurable threshold value.
Configuring Overload Protection
This section describes using the WebLogic Administration console to configure event handlers and statistics collectors.
Execute the following steps in order, since the later configurations have dependencies upon the earlier steps.
Using the WebLogic administration console, you:
-
Configure a new event handler. See "About Event Handlers".
-
Configure actions for the event handler. See "About Actions".
-
Configure a statistics collector. See "About Statistics Collectors".
-
Configure a threshold, which includes a threshold statistics value, as well as sampling intervals, number of samples to collect at each interval (or real-time sampling), an algorithm to calculate the collected samples, as well as actions for upward and downward breaches of the threshold. See "About Thresholds".
About Event Handlers
A Converged Application Server overload protection event handler plugs in to the SPI, and is discovered when the overload protection framework is initialized. When a particular event handler is discovered, only one instance is created and managed by the framework. Each event handler must implement one or more actions. When a threshold-breaching event occurs, the framework executes the actions defined for the event handler.
Each event handler can accept an optional event-handler scoped set of user configurable key/value pairs, which are passed to the event handler's activate()
method as parameters.
Configuring an Event Handler
To configure an overload protection event handler:
- Open the Administration Console for your domain.
- If your domain is running in Production mode, click Lock & Edit.
- Click the SipServer link in the Domain Structure pane.
The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.
- Click the Overload Protections subtab and then click the Event Handlers subtab.
- In the Event Handlers table, click New.
Enter the following information:
- Event Handler Name: Required. Enter a name for
the event handler, for example:
com.oracle.sendSnmpTrap
Table 3-9 Default Event Handlers
Event Handler Description com.bea.wcp.sip.engine.server.olp.handler.ControlTrafficHandler
Used for a new call setup on a SIP container and either reject or accept call traffic.
com.bea.wcp.sip.engine.server.olp.handler.SendSNMPTrapHandler
Used to send SNMP traps.
- Attributes: Optional. Specify key/value attribute
pairs separated by semicolons, for example:
attribute1=21;attribute2=64
Attributes are passed to the event handler as parameters when the event is triggered.
Note:
The com.bea.wcp.sip.engine.server.olp.handler.SendSNMPTrapHandler event handler supports a snmp-trap-message attribute. Its default value is overloadControlActivated. No attributes are supported for the com.bea.wcp.sip.engine.server.olp.handler.ControlTrafficHandler event.
- Event Handler Name: Required. Enter a name for
the event handler, for example:
- Click Save to save your configuration changes.
- If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.
About Actions
Once you have defined an event handler, you must define one or more actions for the event handler to take when a threshold breaching event occurs. As with event handlers, actions are also plugged into the overload protection framework using the SPI, and are discovered when the framework is initialized, and, when discovered, only one instance is created and managed by the framework.
Each action can accept an optional action-scoped set of user configurable
key/value pairs, which are passed to the actions activate()
method as
parameters.
Supported out of the box action types are listed in Table 3-10.
Configuring an Action
To configure an overload protection action:
- Open the Administration Console for your domain.
- If your domain is running in Production mode, click Lock & Edit.
- Click the SipServer link in the Domain Structure pane.
The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.
- Click the Overload Protections subtab and then click the Actions subtab.
- In the Actions table, click New.
Enter the following information:
- Action Name: Required. Enter a name for the
action, for example:
TrafficReject
- Event Handler: Required. Choose the name of an event handler you have created from the drop down list. For information on configuring an event handler, see "About Event Handlers".
- Action Type: Required. Enter an Action Type
supported by the Event Handler, for
example:
reject-traffic
Table 3-10 Default Action Types
Action Type Description accept-traffic
Used by the event handler, com.bea.wcp.sip.engine.server.olp.handler.ControlTrafficHandler . After an overload condition has cleared, accepts SIP session traffic.
reject-traffic
Used by the event handler, com.bea.wcp.sip.engine.server.olp.handler.ControlTrafficHandler . When an overload condition occurs, rejects SIP session traffic. SIP session traffic will continue to be rejected until an accept-traffic action is triggered.
default
Used by the event handler, com.bea.wcp.sip.engine.server.olp.handler.SendSNMPTrapHandler.
- Attributes: Optional. Specify key/value attribute
pairs separated by semicolons, for
example:
attribute1=21;attribute2=64
Attributes are passed when the action is triggered.
Note:
Support for attributes is dependent upon the implementation of the particular action. None of the default Action Types support any attributes.
- Action Name: Required. Enter a name for the
action, for example:
- Click Save to save your configuration changes.
- If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.
About Statistics Collectors
Statistics collectors are also plugged into the overload protection framework using the SPI, and are discovered when the framework is initialized. When a particular statistics collector framework is discovered, only one instance is created and managed by the framework.
Each statistics collector consists of a name, a type and
optional attributes. The collector name is referred to when defining a
threshold as described in "Configuring a Threshold". The
overload protection framework retrieves statistics samples using the statistics
collector's getStats()
method to which the optional attributes are
passed as parameters.
Supported out of the box statistics collectors are described in Table 3-11.
Configuring a Statistics Collector
To configure an overload protection statistics collector:
- Open the Administration Console for your domain.
- If your domain is running in Production mode, click Lock & Edit.
- Click the SipServer link in the Domain Structure pane.
The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.
- Click the Overload Protections subtab and then click the Statistics Collector subtab.
- In the Statistics Collector table, click New.
Enter the following information:
- Statistics Collector Name: Required. Enter a name
for the action, for example:
MBeanStatsCollector
- Statistics Collector Type: Required. Enter an
Action Type supported by the Event Handler, for example:
mbean-stats
The following table lists the Statistics Collector Types supplied with Converged Application Server.
Table 3-11 Default Statistics Collector Types
Statistics Collector Type Description queue-length
Uses the sum of the length of the transport and timer work manager queue lengths.
mbean-stats
Uses an MBean counter as a statistics example.
memory-usage
Returns the call state memory usage from Coherence.
active-diameter-session
Returns the number of active Diameter sessions.
- Attributes: Optional except for the
mbean-stats collector type. Specify key/value attribute pairs
separated by semicolons, for example:
attribute1=21;attribute2=64
Attributes are passed when the action is triggered.
Note:
The mbean-stats collector lets you use an MBean counter for statistics samples. When configuring the collector, the attributes object-name and attribute-name must be set so that the collector can find the attribute value of the particular MBean.
For the object-name attribute, a variable ${server_name} can be used that will be replaced with name of managed server on which the statistics collector is running.
The following example shows a configuration retrieving the ServerAppSessionCount from the SipServerRuntime MBean on the current server.
object-name="com.bea:ServerRuntime=${server_name},Name=${server_name},Type=SipServerRuntime";attribute-name=ServerAppSessionCount
For a complete list of Converged Application Server MBeans, see the Oracle Communications Converged Application Server Java API Reference.
- Statistics Collector Name: Required. Enter a name
for the action, for example:
- Click Save to save your configuration changes.
- If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.
About Thresholds
An overload protection threshold consists of a threshold value, a collector, sampling settings, and two lists of overload protection actions defined for an event handler.
Thresholds work in two modes: a sampling mode with a configurable interval and number of samples, and a real-time mode. For both modes, statistics samples are collected and calculated according to an selectable algorithm and compared to the threshold value. Each threshold has two events, UP_EVENT and DOWN_EVENT. When the threshold is breached upwards, the UP_EVENT event is triggered and when it is breached downwards, the DOWN_EVENT event is triggered.
For each event, you can configure a list of event handler actions. When an event is triggered, the overload protection framework will execute each action associated with the threshold event.
Configuring a Threshold
To configure an overload protection Threshold:
- Open the Administration Console for your domain.
- If your domain is running in Production mode, click Lock & Edit.
- Click the SipServer link in the Domain Structure pane.
The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.
- Click the Overload Protections subtab and then click the Thresholds subtab.
- In the Thresholds table, click New.
Enter the following information:
- Threshold Name: Required. Enter a name for the
action, for example:
queueLengthThreshold
- Threshold Value: Required. Enter the level of the
threshold. This is the value that the threshold must exceed to
trigger an event, for example:
10.0
Note:
The Threshold Value cannot be greater than 100.
- Sampling Mode: Required. Choose either realtime or sampling from the drop down list. In realtime mode, statistics are compared against the Threshold Value when every initial SIP message is received. No calculations are supported.
- Sampling Interval. Required when sampling
mode is selected. Enter the interval at which samples should be taken in
milliseconds, for example:
1000
- Sampling Number. Required when sampling
mode is selected. Enter the number of samples to be taken at each Sampling
Interval, for example:
5
- Algorithm Name: Required. Choose an appropriate
algorithm to calculate samples.
Table 3-12 Algorithm Types
Algorithm Name Description PERCNTILE
Calculates the Pth percentile value of the samples. When PERCNTILE is selected, an Algorithm Parameter value must be provided.
AVERAGE
Calculates the average of the samples (sum of samples divided by number of samples).
VALUE
The straight value of the last sample.
RATE
The sample rate calculated as (last sample - first sample)/(sampling interval).
- Algorithm Parameter: Required when the
PERCNTILE algorithm is selected. Enter a percentile value that
the threshold must match, for example:
65
- Enable: Optional. Check Enable to enable the Threshold.
- Threshold Name: Required. Enter a name for the
action, for example:
- Click Next.
- Choose the Actions to be executed when a threshold is breached upwards (if any) by moving an Action from the Available list to the Chosen list.
- Click Next.
- Choose the Actions to be executed when a threshold is breached downwards (if any) by moving an Action from the Available list to the Chosen list.
- Click Finish to save your configuration changes.
- If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.
Example: Configuring Overload Protection Based upon Session Rate
In the following example you create an overload protection scheme based upon the session rate. You begin by creating an event handler of the type com.oracle.trafficControl to react to traffic control events. Next, you create two actions that the event handler will initiate, one to reject SIP session traffic and another to accept SIP session traffic. You then create a statistics collector that reads counter information from the SipServerRuntime MBean, and you finally create a threhold that takes 5 samples every 1000 milliseconds and reacts on an upwards/downwards breach of a particular threshold value you set.
Once configured, when your threshold value is breached upwards, SIP traffic will be rejected until the threshold value is again breached downwards.
To configure a session rate overload protection scheme:
- Open the Administration Console for your domain.
- If your domain is running in Production mode, click Lock & Edit.
- Click the SipServer link in the Domain Structure pane.
The right pane of the console provides two levels of tabbed pages that are used for configuring and monitoring Converged Application Server. By default, the General configuration subtab is selected.
- Click the Overload Protections subtab and then click the Event Handlers subtab.
- In the Event Handlers table, click New, and enter com.bea.wcp.sip.engine.server.olp.handler.ControlTrafficHandler for the Event Handler Name.
- Click Save to save your configuration changes.
- Click the SipServer link in the Domain Structure pane.
- Click the Overload Protections subtab and then click the Actions subtab.
- In the Actions table, click New and enter the following
information:
- Action Name: Enter TrafficReject.
- Event Handler: Select com.oracle.trafficControl from the drop down list.
- Action Type: Enter reject-traffic.
- Click Save to save your configuration changes.
- In the Actions table, click New and enter the following
information:
- Action Name: Enter TrafficAccept.
- Event Handler: Select com.oracle.trafficControl from the drop down list.
- Action Type: Enter accept-traffic.
- Click the SipServer link in the Domain Structure pane.
- Click the Overload Protections subtab and then click the Statistics Collector subtab.
- In the Statistics Collectors table, click New and enter the
following information:
- Statistics Collector Name: Enter com.bea.wcp.sip.engine.server.olp.collector.MBeanCollector.
- Statistics Collector Type: Enter mbean-stats.
- Attributes:
Enter:
object-name="com.bea:ServerRuntime=${server_name},Name=${server_name},Type=SipServerRuntime";attribute-name=ServerAppSessionCount
- Click Save to save your configuration changes.
- Click the SipServer link in the Domain Structure pane.
- Click the Overload Protections subtab and then click the Thresholds subtab.
- In the Thresholds table, click New and enter the following
information:
- Threshold Name: Enter SessionRate.
- Threshold Value: Enter the threshold value you wish to
use for the maximum number of sessions.
Note:
The Threshold Value cannot be greater than 100.
- Sampling Mode: Select sampling from the drop down list.
- Sampling Interval: Enter 1000 to take a sample every 1000 milliseconds.
- Sampling Number: Enter 5 to take 5 samples at each sampling interval.
- Algorithm Name: Select RATE from the drop down list.
- Statistics Collector: Select com.bea.wcp.sip.engine.server.olp.collector.MBeanCollector from the drop down list.
- Check Enable.
- Click Next.
- For Up Actions, move TrafficReject from the Available list to the Chosen list.
- Click Next.
- For Down Actions move TrafficAccept from the Available list to the Chosen list.
- Click Finish.
- If your domain is running in Production mode, click Activate Changes to apply your changes to the engine servers.