Rule Set Development
Before creating incident rules/rule sets, the first step is to strategically determine when incidents should be created based on the business requirements of your organization. Important questions to consider are:
- What events should create incidents? Which service disruptions need to be tracked and resolved by IT administrators?
- Which administrators should be notified for incoming events or incidents?
- Are any of the events or incidents being forwarded to external systems (such as a helpdesk ticketing system)?
Example 5-1 Example Rule Set
-
Rule Set applies to target: Group Target G
-
Rules in the Rule Set:
-
Rule(s) to create incidents for specified events
-
Rule(s) that send notifications on incidents
-
Rule(s) that escalate incidents based on some condition. For example, the length of time an incident is open.
-
Example 5-2 Example Rule Set in Greater Detail
-
Rule Set for Production Group G
-
Target: Production Group G
-
Rule 1: Create an incident for all target down events.
-
Rule 2: Create an incident for specific database, host, and WebLogic Server metric alert event of critical or warning severity.
-
Rule 3: Create an incident for any problem job events.
-
Rule 4: For all critical incidents, sent a page. For all warning incidents, send email.
-
Rule 5: If a Fatal incident is open for more than 12 hours, set the escalation level to 1 and email a manager.
-
Once the exact business requirements are understood, you translate those into enterprise rule sets. Adhering to the following guidelines will result in efficient use of system resource as well as operational efficiency.
-
For rule sets that operate on targets (for example, hosts and databases), use groups to consolidate targets into a smaller number of monitoring entities for the rule set. Groups should be composed of targets that have similar monitoring requirements including incident management and response.
-
All the rules that apply to the same groups of targets should be consolidated into one rule set. You can create multiple rules that apply to the targets in the rule set. You can create rules for events specific to an event class, rules that apply to events of a specific event class and target type, or rules that apply to incidents on these targets.
-
Leverage the execution order of rules within the rule set. Rule sets and rules within a rule set are executed in sequential order. Therefore, ensure that rules and rule sets are sequenced with that in mind.
When creating a new rule, you are given a choice as to what object the rule will apply— events, incidents or problems. Use the following rule usage guidelines to help guide your selection.
Table 5-7 Rule Usage Guidelines
Rule Usage | Application |
---|---|
Rules on Event |
To create incidents for the events managed in Enterprise Manager. To send notifications on events. To create tickets for incidents managed by helpdesk analysts, you want to create an incident for an event, then create a ticket for the incident. Send events to third-party management systems. |
Rules on Incidents |
Automate management of incident workflow operations (assign owner, set priority, escalation levels..) and send notifications Create tickets based on incident conditions. For example, create a ticket if the incident is escalated to level 2. |
Rules on Problems |
Automate management of problem workflow operations (assign owner, set priority, escalation levels..) and send notifications |
Rule Set Example
The following example illustrates many of the implementation guidelines just discussed. All targets have been consolidated into a single group, all rules that apply to group members are part of the same rule set, and the execution order of the rules has been set. In this example, the rule set applies to a group (Production Group G) that consists of the following targets:
-
DB1 (database)
-
Host1 (host)
-
WLS1 (WebLogic Server)
All rules in the rule set perform three types of actions: incident creation, notification, and escalation.
In a more detailed view of the rule set, we can see how the guidelines have been followed.
In this detailed view, there are five rules that apply to all group members. The execution sequence of the rules (rule 1 - rule 5) has been leveraged to correspond to the three types of rule actions in the rule set: Rules 1-3
-
Rules 1-3: Incident Creation
-
Rule 4: Notification
-
Rule 5: Escalation
By synchronizing rule execution order with the progression of rule action categories, execution efficiency is achieved. As shown in this example, by using conditional actions that take different actions for the same set of events based on severity, it is easier to change the event selection criteria in the future without having to change multiple rules. Note: This assumes that the action requirements for all incidents (from rules 1 - 3) are the same.
The following table illustrates explicit rule set operation for this example. All targets are within Production Group G.
Table 5-8 Example Rule Set for Production Group G
Rule Name | Execution Order | Criteria | Triggering Condition | Actions |
---|---|---|---|---|
Rule 1 |
First |
DB1 goes down . Host1 goes down. WLS1 goes down. |
N/A |
Create incident. |
Rule 2 |
Second |
DB1 Tablespace Full (%) Note: The warning and critical thresholds are defined in Metric and Policy settings, not from the rules UI. Host1 CPU Utilization (%) WLS1 Heap Usage (%) |
If severity=Warning If severity=Critical |
Create incident. |
Rule 3 |
Third |
Event generated for problem job status changes for DB1, Host1, and WLS1. |
N/A |
Create incident. |
Rule 4 |
Fourth |
All incidents for Production Group G |
Severity=Warning Severity=Critical |
Send email Send page |
Rule 5 |
Fifth |
Incident remains open for more than 12 days. |
Status=Fatal |
Increase escalation level to 1. |