79 Configuring System Overload Protection
Learn how to implement system overload protection in Oracle Communications Billing and Revenue Management (BRM) Elastic Charging Engine (ECE).
System overload can cause charging nodes to become stuck or unresponsive. The following scenarios might lead to an overloaded system:
-
An undersized ECE deployment
-
Large batches of offline records
-
Bulk customer updates that trigger numerous update requests
When the usage request throughput from ECE exceeds the system’s capacity, ECE reduces its throughput until it reaches a sustainable error-free level. ECE also notifies the client of any submitted requests that it could not process.
Note:
Management requests are always accepted, even when the system is overloaded.
Overload protection uses thread pools to accept and process requests submitted to the system. Thread pools enhance performance during large update operations by reducing the overhead associated with each update. They also provide a means of bounding and managing the resources, including the handling of requests.
In an overload situation, ECE begins to reject incoming requests. You can configure overload protection in one of the following ways, depending on how you want the system to handle requests:
-
Reject all incoming requests during overload conditions. See "Configuring Overload Protection to Reject All Requests".
-
Reject requests based on priority. See "Configuring Overload Protection to Reject Requests Based on Priority".
Configuring Overload Protection to Reject All Requests
When you configure an ECE gateway for overload protection, it continuously monitors the number of pending requests in the queue to determine if it exceeds the threshold. If the number of pending requests exceeds this threshold, the gateway starts to reject all new incoming requests with a BRS overload exception. This rejection policy applies to all types of requests.
Upon rejecting the requests, the ECE gateway sends the client an error message indicating that the server is busy. The client can then redirect these requests to an alternative gateway. When the number of pending requests falls back below the configured threshold, the gateway resumes processing new incoming requests.
To configure system overload protection during runtime:
-
Access the ECE configuration MBeans in a JMX editor, such as JConsole. See "Accessing ECE Configuration MBeans".
-
Expand the ECE Configuration node.
-
Expand BatchRequestService.
-
Expand Attributes.
-
Specify values for the overload protection attributes described in Table 79-1.
Alternatively, you can set the attributes in Table 79-1 in your ECE_home/config/management/charging-settings.xml file and then restart your ECE system. See "Starting and Stopping ECE".
Note:
When the following MBean attributes are set, they are not persisted. If the ECE Gateway is restarted, attributes are reset to their default values.
Table 79-1 MBean Attributes to Configure Overload Protection to Reject All Requests
MBean Attribute | Description |
---|---|
AcceptablePendingCount |
The maximum number of requests that can be accepted and queued for processing. If the number of pending requests in the queue equals or exceeds this threshold, ECE starts rejecting incoming requests. The pending count should match the number of threads. Select this value carefully, based on the expected throughput of the ECE instance and the expected latency of each request as indicated by your performance testing results. The default is 10. |
OverloadProtection |
The flag that enables overload protection for rejecting all incoming requests. The default value is disabled (false). |
Configuring Overload Protection to Reject Requests Based on Priority
Note:
-
Priority-based overload protection is applicable only to HTTP Gateway and Diameter Gateway.
-
This functionality requires ECE Interim Patch 15.1.0.0.1 (37951934).
You can configure overload protection in ECE to actively manage incoming requests based on a priority system. The priority levels are determined by the overload condition, which is calculated based on the following factors:
-
The number of requests in the queue
-
The average latency
A Threshold Overload Monitor independently monitors the queue depth and latency using an exponentially weighted moving average (EWMA) combined with hysteresis (historical system data). EWMA allows the monitor to adjust thresholds automatically based on data trends, while hysteresis ensures smooth transitions in and out of overload protection levels, even when there are short-lived threshold crossings. This mechanism ensures that the system exits an overload level only after the underlying conditions are fully resolved.
When overload conditions exceed the configured threshold for a specific level (one, two, or three), the Threshold Overload Monitor activates the corresponding overload protection level. For example, by default, if the average latency surpasses 1000 milliseconds, the monitor triggers level one overload protection.
Each operation is assigned a default priority (LOW, MEDIUM, or HIGH), which you can modify. You can also update the list of operations. The following describes the behavior of the Threshold Overload Monitor when it activates a level:
-
Level One: It rejects requests for operations with a LOW priority, such as create requests, preventing the creation of new objects and conserving memory.
-
Level Two: It starts rejecting requests for operations with a MEDIUM priority, such as update requests.
At this level, the system rejects both LOW and MEDIUM priority operations.
-
Level Three: It starts rejecting requests for operations with a HIGH priority, such as terminate requests.
At this level, the system rejects all LOW, MEDIUM, and HIGH priority operations.
To configure overload protection so that requests are rejected based on priority, follow these steps:
-
Specify the values for entering and exiting overload levels and adjust the EWMA and hysteresis controls. See "Configuring the Threshold Overload Monitor".
-
Assign a LOW, MEDIUM, or HIGH priority to each operation. See "Assigning Priorities to Operations".
Note:
-
OverloadProtection to true.
-
AcceptablePendingCount to 0.
Configuring the Threshold Overload Monitor
You can enable the priority-based overload protection and configure the Threshold Overload Monitor in ECE by configuring the overloadConfigurations attributes during installation or when you are ready to restart ECE. You can do this by editing the ECE_home/config/management/charging-settings.xml file. To do so:
-
Open your charging-settings.xml file.
-
In the overloadConfigurations section, set the attributes as described in "overloadConfigurations Attribute Entries".
For example:
<overloadConfigurations config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.OverloadConfigurations"> <overloadConfigurationList config-class="java.util.ArrayList"> <overloadConfiguration config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.OverloadConfiguration" name="default" enabled="true" compositionType="OR"> <thresholdOverloadMonitorList config-class="java.util.ArrayList"> <thresholdOverloadMonitor config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdOverloadMonitorConfiguration" name="BRS Monitor"> <thresholdMonitors config-class="java.util.ArrayList"> <thresholdMonitor config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdMonitorConfiguration" name="BRS Latency" type="BRS_LATENCY" updateIntervalMs="100" initialDelayMs="10000"> <defaults config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdMonitorLevelConfiguration" entryDelayMs="1000" exitDelayMs="500" alpha="0.1" adaptivityFactor="0.5"/> <levelOne config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdMonitorLevelConfiguration" entryThreshold="2000" exitThreshold="1250" entryDelayMs="1000" exitDelayMs="500" alpha="0.1" adaptivityFactor="0.5"/> <levelTwo config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdMonitorLevelConfiguration" entryThreshold="2500" exitThreshold="2250" entryDelayMs="1000" exitDelayMs="500" alpha="0.1" adaptivityFactor="0.5"/> <levelThree config-class="oracle.communication.brm.charging.appconfiguration.beans.overload.ThresholdMonitorLevelConfiguration" entryThreshold="3000" exitThreshold="2700" entryDelayMs="1000" exitDelayMs="500" alpha="0.1" adaptivityFactor="0.5"/> </thresholdMonitor> </thresholdMonitors> </thresholdOverloadMonitor> </thresholdOverloadMonitorList> </overloadConfiguration> </overloadConfigurationList> </overloadConfigurations>
-
Restart ECE. See "Starting and Stopping ECE".
overloadConfigurations Attribute Entries
Table 79-2 lists the attributes to enable the overload protection based on priority of the requests.
Table 79-2 Attributes for Enabling Overload Protection Based On Priority
Parameter | Description |
---|---|
name |
The name of the overload configuration. The default value is default. |
enabled |
The flag that enables overload protection based on priority level. The default value is disabled (false). |
compositionType |
Whether the overload conditions are evaluated using an OR or AND. The default is OR. |
Table 79-3 lists the attributes to set up and configure the threshold overload monitor.
Table 79-3 Attributes for Configuring the Threshold Overload Monitor
Parameter | Description |
---|---|
name |
The name of the Threshold Overload Monitor. The default value is BRS Latency. |
type |
The type of overload metric being monitored. The default value is BRS_LATENCY. |
updateIntervalMs |
The update interval, in milliseconds, that specifies how often to calculate threshold. The default is 100. |
initialDelayMs |
The initial delay, in milliseconds, before the monitor begins to track the overload. The default is 10000. |
Table 79-4 lists the attributes that define the behavior of the default overload level, including delays and smoothing factors.
Table 79-4 Attributes for Configuring the Default Level
Parameter | Description |
---|---|
entryDelayMs |
The delay, in milliseconds, before the entry threshold is reached. The default value for each level is 1000. |
exitDelayMs |
The delay, in milliseconds, after the system gets out of the exit threshold. The default value for each level is 500. |
alpha |
The control for how quickly the system adapts to a new condition.
The default value is 0.1. |
adaptivityFactor |
The control for how much the system adjusts its threshold in response to the changing conditions.
The default value is 0.5. |
Table 79-5 lists the attributes to set the behavior of specific overload levels, allowing users to set entry and exit thresholds, delays, and adaptive parameters for each level. You must set all these parameters individually for level one, level two, and level three.
Table 79-5 Attributes for Configuring Level One, Level Two, and Level Three
Parameter | Description |
---|---|
entryThreshold |
The threshold for the number of requests at which the system enters the overload state. |
exitThreshold |
The threshold for the number of requests at which the system exits the overload state. |
entryDelayMs |
The delay, in milliseconds, before the entry threshold is reached. The default value for each level is 1000. |
exitDelayMs |
The delay, in milliseconds, after the system gets out of the exit threshold. The value is in milliseconds. The default value for each level is 500. |
alpha |
The control for how quickly the system adapts to a new condition.
The default value is 0.1. |
adaptivityFactor |
The control for how much the system adjusts its threshold in response to the changing conditions.
The default value is 0.5. |
Note:
-
You need to apply all the attributes in Table 79-5 separately for each level.
-
Oracle highly recommends going with the default values for the parameters in Table 79-4 and Table 79-5. If your system requires different values, it is a good idea to contact Oracle Support before proceeding with the changes. For more information, see "Common Pitfalls While Configuring Overload Protection Parameters".
Common Pitfalls While Configuring Overload Protection Parameters
When configuring overload protection, there are a few common issues that can come up based on the values assigned to the parameters. For example:
-
Low or high value for alpha: If the value for alpha is too low, it may smooth out the short-term changes to the point that overlooks genuine issues. Similarly, if the value for alpha is too high, it can cause significant reactions even to brief anomalies.
-
Close threshold values: If the entry and exit threshold values are too close together, it can result in frequent state changes.
-
Short delay values: If the delay for entry or exit is too low, it can result in a false alarm for transient spikes.
-
High adaptivity values: If the adaptivity is too high, the thresholds continuously adjust, potentially missing issues that require attention.
Assigning Priorities to Operations
All BRS requests contain messages and each of these messages has one or multiple operations. These operations are each mapped to a priority level. The priority level assigned to an operation, and therefore to the request associated with it, determines the overload level at which a particular request will be rejected. For example, a request with LOW priority is rejected during overload Level One.
You can find the default operations and their priorities in the charging-settings.xml file. You can modify these in either of the following ways:
-
During installation. See "Configuring Operation Priority with Restart".
-
During runtime. See "Configuring Operation Priority Without Restart".
Configuring Operation Priority with Restart
You can configure the messagePriorityConfigurations attributes during installation or when you are ready to restart ECE. You can do this by editing the ECE_home/config/management/charging-settings.xml file. To do so:
-
Open your charging-settings.xml file.
-
In the messagePriorityConfigurations section, assign one of the following values to each operation:
- LOW for the operations that are of least priority and can be rejected first.
- MEDIUM for the operations that are of moderate priority and can be rejected next after the LOW_PRIORITY operations.
- HIGH for the operations that are of highest priority and are open for rejection only when there is an extreme overload.
For example:
<messagePriorityConfigurations config-class="oracle.communication.brm.charging.appconfiguration.beans.messages.MessagePriorityConfigurations"> <messagePriorityConfigurationList config-class="java.util.ArrayList"> <messagePriorityConfiguration config-class="oracle.communication.brm.charging.appconfiguration.beans.messages.MessagePriorityConfiguration" name="default" operationName1="value" operationName2="value" </messagePriorityConfigurationList> </messagePriorityConfigurations>
where operationName1 and operationName2 are the names of the operations. The operations and their deafult values are prepopulated into the existing file.
-
Restart ECE. See "Starting and Stopping ECE".
Configuring Operation Priority Without Restart
You can use a JMX editor to configure messagePriorityConfigurations attributes without restarting ECE.
To configure the messagePriorityConfigurations attributes:
-
Access the ECE configuration MBeans in a JMX editor, such as JConsole. See "Accessing ECE Configuration MBeans".
-
Expand the ECE Configuration node.
-
Expand charging.messagePriorityConfigurations.
-
Expand Attributes and select the attribute you want to configure.
-
Set the value of the attribute.