11 Deploying Service Impact Analysis

This chapter describes how to deploy and manage Service Impact Analysis.

Service Impact Analysis Overview

Service Impact Analysis enables you to view the alarm events associated with UIM resources and view the impacts to customer, service, network, logical, and physical resources, and connectivity. It also enables you to assign ownership to specific individuals and track the impact lifecycle using the analysis process. These alarm events are associated to the corresponding UIM resource through the Alarm Consumer service, using the ora-alarm-topology topic with TMF642 alarm event JSON format.

For the architecture, see "ATA Architecture".

You must deploy ATA before deploying Service Impact Analysis.

Creating Service Impact Analysis Images

You must have created the Service Impact Analysis images as part of "Prerequisites and Configuration for Creating ATA Images" and "Creating ATA Images".

Verify if the following images are available:

  • uim-7.8.0.0.0-alarm-consumer-1.3.0.0.0:latest
  • uim-7.8.0.0.0-impact-analysis-api-1.3.0.0.0:latest

Creating Service Impact Analysis Instance

The Service Impact Analysis instance is dependent on the ATA Instance to be deployed.

Prerequisites:

  • Deploy ATA using "Creating an ATA Instance".

  • Create the required secrets and other configurations (such as Authentication, Oracle Database schema, SmartSearch) while deploying ATA.

Configuring the applications.yaml File

To configure the applications.yaml file:

  1. Edit the applications.yaml file to provide the image in your repository (name and tag) by running the following command:

    vi $SPEC_PATH/$PROJECT/$INSTANCE/applications.yaml
  2. Edit the image names to reflect the Service Impact Analysis image names and location in your docker repository as follows:
    ata:
      name: "ata"
      image:
        alarmConsumerName: uim-7.8.0.0.0-alarm-consumer-1.3.0.0.0
        impactAnalysisApiName: uim-7.8.0.0.0-impact-analysis-api-1.3.0.0.0
        alarmConsumerTag: latest
        impactAnalysisApiTag: latest
        repository:
        repositoryPath:
  3. Edit the applications.yaml file to update the replicaCount of alarmConsumer and impactAnalysisApi. The sample configuration is as follows. Update the replica count according to your performance needs:
    alarmConsumer:
      name: "alarm-consumer"
      replicaCount: 3
     
    impactAnalysisApi:
      name: "impact-analysis-api"
      replicaCount: 3

Configuring Service Impact Analysis

This section helps you to configure Service Impact Analysis.

Configuring UIM

Impact correlation for the events submitted through alarm consumer in the system is done at the UIM side, which then can be viewed on the Service Impact Analysis UI. See "About SIA" for more information.

Configuring Service Impact Analysis API

Service Impact Analysis provides APIs for persisting and managing events and impact correlation of these events. It uses OpenSearch indexes as its persistence mechanism.

Sample configuration files impactanalysis-static-config.yaml.sample and impactanalysis-dynamic-config.yaml.sample are provided as the sample files for Impact Analysis API service that are under $COMMON_CNTK/charts/ata-app/charts/ata/config/impact-analysis-api.

To override configuration properties, copy the sample static property file to impactanalysis-static-config.yaml and sample dynamic property file to impactanalysis-dynamic-config.yaml. Provide key value to override the default value provided for any specific system configuration property. The properties defined in property files are entered in the container using Kubernetes configuration maps. Any changes to these properties require the instance to be upgraded. Restart the pods after updating the configuration changes to impactanalysis-static-config.yaml.

Date Format

Any modifications to the date format used by all dates must be consistently applied to all consumers of the APIs. API serializes and deserializes the date attributes stored in OpenSearch indexes using following date format:

impactanalysis:
  api:
    dateformat: yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

Event Status

Service Impact Analysis API supports the following types of events:

  • RAISED: This event type is for new events.
  • UPDATED: This event type is for existing events with updated information.
  • CLEARED: This event type is for events that have been Closed.
  • REJECTED: This event type is for events that are invalid and are rejected through Reject Event action available on Service Impact Analysis UI.

    Note:

    Rejected events move to a different index called smartsearch-rejected-event and are not deleted. Rejecting events is to isolate invalid events that are submitted to the system.

The following event statuses, apart from REJECTED are standard TMF642 event statuses. These event status mappings are part of Unified Assurance integration and should not be changed:

impactanalysis:
  event-status:
    CLEARED: CLEARED
    RAISED: RAISED
    UPDATED: UPDATED
    REJECTED: REJECTED

Event Severity

Service Impact Analysis API and ATA support various types of event severities on a Device. The severities from most severe to least severe are CRITICAL (1), MAJOR (5), WARNING (10), INTERMEDIATE (15), MINOR (20), CLEARED (25), and None (999). Internally, a numeric value is used to identify the severity hierarchy. The top three most severe events (CRITICAL, MAJOR, WARNING) are tracked in ATA.

The following event severities are standard TMF642 event severities. These event severity mappings are part of Unified Assurance integration and should not be changed:

impactanalysis:
  severity:
    CLEARED: CLEARED
    INDETERMINATE: INDETERMINATE
    CRITICAL: CRITICAL
    MAJOR: MAJOR
    MINOR: MINOR
    WARNING: WARNING

Impact Calculation Thread Pool Size

The impact calculation thread pool size defines the maximum number of open REST requests at a time to UIM for fetching impacts.

impact:
  threadPoolSize: 10

SmartSearch and OpenSearch Related Configurations

The configurations related to SmartSearch and OpenSearch are as follows:

  • smartsearch.lang: The value of SmartSearch internal field used in lang pipeline processing. The only supported value is en.
  • smartsearch.tenantId: The value of SmartSearch internal field for search tenancy. The only supported value is tenant1.
  • smartsearch.fetchSize: Defines the number of documents that should be fetched at a time in memory from SmartSearch for processing during bulk operations. The maximum limit for this value is 10000.
  • smartsearch.coolDownInterval: Defines the cool-down interval in milliseconds for OpenSearch index after each batch of documents is processed (bulk updates and deletes).
smartsearch:
  language: en
  tenantId: tenant1
  fetchSize: 1000
  coolDownInterval: 1000

Events Related Configurations

The configuration related to events are as follows:

  • event.idPrefix: Defines the prefix used during document id generation for OpenSearch event index.
  • event.reportPrefix: Defines the prefix used during report id generation when the analysis status of the event transitions to COMPLETED.
  • event.defaultOwner: Defines the default owner name value to be used in case the owner field is not populated during event creation.
event:
  idPrefix: UE
  reportPrefix: REP
  defaultOwner: Unassigned
Configuring Alarm Consumer

Sample configuration files alarm-consumer-static-config.yaml.sample and alarm-consumer-dynamic-config.yaml.sample are provided under $COMMON_CNTK/charts/ata-app/charts/ata/config/alarm-consumer.

To override configuration properties, copy the sample static property file to alarm-consumer-static-config.yaml and sample dynamic property file to alarm-consumer-dynamic-config.yaml. Provide key value to override the default value provided out-of-the-box for any specific system configuration property. The properties defined in property files are provided to the container using Kubernetes configuration maps. Any changes to these properties require the instance to be upgraded. Restart the pods after updating the configuration changes to alarm-consumer-static-config.yaml.

The alarm consumer service receives the alarm event notifications in TMF642 v5.0 specification JSON string format (TMF642 alarm JSON wrapped in TMF688 event JSON) from ora-alarm-topology Kafka topic. As part of the alarm event notifications processing, alarm is created and associated with the effected entity (node or sub-node) in the Service Impact Analysis service to resolve the use cases. Based on the event type in the notification, the alarms can be updated, cleared, or deleted from the affected entity.

The default implementation in processing an alarm is to retrieve the entity (device and sub-device) from Inventory (UIM or ATA) by filtering with name and entity type and associate the alarm. The alarmedObject.id element value, in the alarm object, represents the name of entity (the device and sub-device identification is separated by "::" delimiter). The alarmObject.@referredType element value represents entity type (device or sub-device type) and the value for the @referredType element represents the TMF639 sub-type for the resource on which the alarm is raised. The resource sub-types are listed in the following sections.

The following sections list the extension available to configure or customize the entity look logic:

  • Resource Type mappings
  • Customizing Device Mapping
  • Alarmed object extension

Following are the samples on the alarmedObject sub-structure from the TMF642 alarm event specification. For more details on the full event payload, see the Active Topology Automator Asynchronous Events Guide:

Sample alarmedObject sub structure for device alarm

{
  "eventId": "700001",
  "@type": "AlarmCreateEvent",
  "eventType": "AlarmCreateEvent",
  "event": {
    "alarm": {
      .....
      "alarmedObject": {
        "@referredType": "PhysicalDevice",
        "@type": "AlarmedObjectRef",
       "id": "LSN/EMS_XDM_33/9489"
      },
     ...
   }
 }

Sample alarmedObject sub structure for the sub-device (port) alarm

{
  "eventId": "700001",
  "@type": "AlarmCreateEvent",
  "eventType": "AlarmCreateEvent",
  "event": {
    "alarm": {
      .....
      "alarmedObject": {
        "@referredType": "PhysicalPort",
        "@type": "AlarmedObjectRef",
       "id": "LSN/EMS_XDM_33/9489::LSN/EMS_XDM_33/P01-142.1K.07-Line-Card1.OTU4_8"
      },
     ...
   }
 }

Configuring Incoming Channel

For performance improvement tuning uncomment or add the following in the alarm-consumer-static-config.yaml file to override the default configuration:

  • Edit max.poll.interval.ms to increase or decrease the delay between invocations of poll() while using the consumer group management.
  • Edit max.poll.records to increase or decrease the maximum number of records returned in a single call to poll().
    mp.messaging:
      incoming:
        toFaultChannel:
    #      max.poll.interval.ms: 300000
    #      max.poll.records: 25
        toRetryChannel:
    #      max.poll.interval.ms: 300000
    #      max.poll.records: 25
        toDltChannel:
    #      max.poll.interval.ms: 300000
    #      max.poll.records: 100

Impact Analysis API

The impact analysis API is as follows:

impactAnalysis:
  url: http://localhost:8084

Resource Type Mappings

The TMF639ResourceType-mappings.yaml file provides mapping from protocol specific to TMF639 sub-resource types supported by UIM. This mappings file should be updated only when the alarm event is not sent with resource type values supported by the UIM.

The alarmedObject.@referredType element value should be representing the TMF639 sub-resource type, which is supported by UIM. You must use this resource mapping extensibility only when the assurance system does not send the resource types which UIM can understand.

For example: Sending some protocol specific object type. The Corba protocol has object types as OT_EQUIPMENT, OT_MANAGED_ELEMENT, and so on.

When an alarm event does not contain UIM's TMF639 sub-resource types, map the protocol-native resource type to its TMF639 sub-resource type in the TMF639ResourceType-mappings.yaml file and upgrade the alarm consumer service.

The TMF639 resource types supported in alarm consumer are: PhysicalDevice, Equipment, PhysicalPort, and DeviceInterface.

The TMF639ResourceType-mappings.yaml is provided in COMMON_CNTK\charts\ata-app\charts\ata\config\alarm-consumer for extensibility. This file is available with some out-of-the-box default mappings as follows:
deviceTypeMapping:
  PhysicalDevice:
    - OT_MANAGED_ELEMENT
  Equipment:
    - OT_EQUIPMENT
    - CHASSIS
    - BACKPLANE
    - MODEL
    - RACK
    - SHELF
    - CARD
  PhyiscalPort:
    - OT_PHYSICAL_TERMINATION_POINT
    - PORT
    - PTP
  DeviceInterface:
    - OT_CONNECTION_TERMINATION_POINT
    - CTP
  Pipe:
  Connectivity:

Customizing Device Mapping

By default, the device or sub-device is found by name (using the value from the alarmedObject.id). Configuration can be updated to check the device or sub-device by other fields (such as Id or deviceIdentifier) when the alarm event is having id or deviceIdentifier values as part of the alarmedObject.id element.

In order to use different lookup fields, configure the deviceMappings.inventory.lookupFields sub-structure accordingly in the alarm-consumer-static-config.yaml file and upgrade the alarm consumer service.

A sample file is provided in COMMON_CNTK\charts\ata-app\charts\ata\config\alarm-consumer location. If you are modifying it for the first time, rename the alarm-consumer-static-config.yaml.sample file to alarm-consumer-static-config.yaml and update the values accordingly. The supported lookup fields are name, id, and deviceIdentifier. A sample sub-structure is as follows:

deviceMapping:
  inventory:
    lookupFields: # The lookup is done according to the provided order. Supported values are name, id & deviceIdentifier
    - name 
    - id
    - deviceIdentifier
  customizeDeviceLookup:
    enabled: false

The above YAML configuration is used to change the device mapping.

Table 11-1 Device Mapping Fields

Field Description
deviceMapping.lookupFields

This is an array field that can have only the values : name, id, deviceIdentifier. There are names of the fields in UIM Entity which will be used to search the device/sub-device what is mentioned in the alarmedObject.id field. The name field is default. That means, the first part of the '::' of alarmedObject.id field by default is searched with name.

The order of the array is followed. Therefore, if array is updated in id, deviceIdentifier, or name fields, the alarm consumer will take the first part of '::' of alarmedObject.id field and then search it in database with the corresponding id, since the array first element is id.

Only the first three entries of the array are considered for searching. That means, for name, id, and deviceIdentifier settings, it will search with the name first and if not found then id, if not found then deviceidentifier. Once a device is found, no further matching will be performed. In case no device is found or multiple devices found, see "Fallout Events Resolution for Alarm Consumer" for more information.

deviceMapping.customizeDeviceLookup.enabled

This is a Boolean value. The default value is false.

If it is true, the alarm consumer enables extensibility to its user to provide groovy script, which should return a single value by processing either alarmedObject or alarm (TMF-642). This single value which is expected to be returned from the Groovy is used to match in the database.

The sample Groovy script is mentioned in the alarmed object extension sections.

Note:

In the previous release, deviceMapping.lookupFields was mentioned to have possible values like ipv4 and ipv6 also. From 1.3.0, alarm on sub-node is supported. The deviceMapping.lookupFields does not support ipv4 and ipv6. Valid values from 1.3.0 are name, id, deviceidentifier only.

import groovy.json.JsonSlurper
/**
 * The default delimiter is which is available via method argument named "delimiter".
 * The "alarmedObject" parameter is extracted from alarmedObject sub-section as String from the Alarm.
 * The "alarm" parameter is the complete alarm information received as String.
 * The return type must be type of Map of String as key and value.
 */
def getDeviceIds(delimeter,alarmedObject,alarm){
  def jsonSlurper = new JsonSlurper()
  def alo = jsonSlurper.parseText(alarmedObject)
  def aloId = alo.id
  def referredType = alo.'@referredType'
  def device = aloId.split(delimeter)
  def deviceInfo = [:]
  //Custom implementation starts.  The following is default implementation which return the keys.
  //node and subNode should be the name of which will be searched in ATA/Inventory databases.
  if (device.size() == 2) {
    //node-name
    deviceInfo["node"] = device[0]
    //subNode-name
    deviceInfo["subNode"] = device[1]
  } else {
     deviceInfo["node"] = device[0]
     deviceInfo["subNode"] = ""
  }
  //The referredType must match with TMF639ResourceType-mappings.yaml mapping file.  Blank value considered as PhysicalDevice
  deviceInfo["referredType"] = referredType
  //Custom implementation ends
  return deviceInfo;
}

Alarmed Object Extension

If the alarmedObject sub-structure has different values or format than the above sub-sections, the provided Groovy file has to be modified to parse and return the identifier. This groovy custom code runs when the deviceMapping.customizedDeviceLoop.enabled element value is configured to true in the alarm-consumer-static-config.yaml file.

Update the out-of-the-box provided Groovy code and the system returns the node/sub-node identifier and referredType values from this Groovy file. This groovy file is provided in COMMON_CNTK\charts\ata-app\charts\ata\config\alarm-consumer location. Enable the deviceMapping.customizedDeviceLoop.enabled value to true and update the alarm consumer service.

The sample implementation is as follows:

/*
* Copyright (c) 2024. Oracle and/or its affiliates. All rights reserved.
*/
import groovy.json.JsonSlurper
/**
 * The default delimiter is which is available via method argument named "delimiter".
 * The "alarmedObject" parameter is extracted from alarmedObject sub-section as String from the Alarm.
 * The "alarm" parameter is the complete alarm information received as String.
 * The return type must be type of Map of String as key and value.
 */
def getDeviceIds(delimeter,alarmedObject,alarm){
  def jsonSlurper = new JsonSlurper()
  def alo = jsonSlurper.parseText(alarmedObject)
  def aloId = alo.id
  def referredType = alo.'@referredType'
  def device = aloId.split(delimeter)
  def deviceInfo = [:]
  //Custom implementation starts.  The following is default implementation which return the keys.
  //node and subNode should be the name of which will be searched in ATA/Inventory databases.
  if (device.size() == 2) {
    //node-name
    deviceInfo["node"] = device[0]
    //subNode-name
    deviceInfo["subNode"] = device[1]
  } else {
     deviceInfo["node"] = device[0]
     deviceInfo["subNode"] = ""
  }
  //The referredType must match with TMF639ResourceType-mappings.yaml mapping file.  Blank value considered as PhysicalDevice
  deviceInfo["referredType"] = referredType
  //Custom implementation ends
  return deviceInfo;
}

Note:

The Groovy script (which was available on previous version) is not compatible from1.3.0. Consider this to be a new script which supports the alarm on sub-node. The previous Groovy script logic has to be written in to this new Groovy file.

Mounting Groovy Scripts To Alarm Consumer

To mount Groovy scripts to alarm consumer pod:

  1. Edit the script in $COMMON_CNTK/charts/ata-app/charts/ata/config/alarm-consumer/DeviceMapping.groovy location.
  2. Upgrade ATA instance as follows:
    $COMMON_CNTK/scripts/upgrade-applications.sh -p project -i instance -f $SPEC_PATH/project/instance/applications.yaml -a ata

Service Impact Analysis Customer Mappings

See "Impact Analysis Customer Mappings" for more information.

Roles Required for Accessing Service Impact Analysis

For information on roles required for accessing Service Impact Analysis, see "About Authentication".

Deploying Service Impact Analysis Instance

To deploy a Service Impact Analysis instance in your environment using the scripts that are provided with the toolkit, run the following command to create an instance after updating the applications.yaml and configuring Service Impact Analysis:

$COMMON_CNTK/scripts/upgrade-applications.sh -p sr -i quick -f $SPEC_PATH/sr/quick/applications.yaml -a ata

Managing Service Impact Analysis Instance

The SIA instance consists of alarm-consumer and impact-analysis-api services. Update the corresponding sections in the applications.yaml file and follow the steps mentioned in the following sections of "Deploying the Active Topology Automation Service":

Managing Service Impact Analysis Logs

To customize and enable logging, update the logging configuration files for the application as follows:

  1. Customize impact-analysis-api service logs:
    • For service level logs, update the $COMMON_CNTK/charts/ata-app/charts/ata/config/impact-analysis-api/logging-config.xml file.
    • For Helidon-specific logs, update the $COMMON_CNTK/charts/ata-app/charts/ata/config/impact-analysis-api/logging.properties file. By default, the console handler is used. You can provide filehandler, uncomment the following lines, and provide the project and instance names for location to save logs.
    handlers=io.helidon.common.HelidonConsoleHandler,java.util.logging.FileHandler
    java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
    java.util.logging.FileHandler.pattern=/logMount/sr-quick/ata/ata-api/logs/ImpactAnalysisJULMS-%g-%u.log
  2. Customize alarm-consumer service logs as follows:
    • For service level logs, update the $COMMON_CNTK/charts/ata-app/charts/ata/config/alarm-consumer/logging-config.xml file.
    • For Helidon server logs, update the $COMMON_CNTK/charts/ata-app/charts/ata/config/alarm-consumer/logging.properties file.
  3. Once the log configuration files are updated, upgrade the ATA instance. The sample upgrade script is as follows:
    $COMMON_CNTK/scripts/upgrade-applications.sh -p sr -i quick -f $SPEC_PATH/$PROJECT/$INSTANCE/applications.yaml -a ata

Alternate Configuration Options

See "Deploying the Active Topology Automation Service" for more information.

See "Fallout Events Resolution for Alarm Consumer" to resolve the fallout events for the alarms.

Fallout Events Resolution for Alarm Consumer

The following image illustrates an alarm event (or message) processing flow in alarm consumer.

Figure 11-1 Process Flow of Fallout Events Resolution for Alarm Consumer



Troubleshooting the Alarm Fallouts

Alarm fallouts are the alarms that could not be processed in alarm consumer because of the exceptions or errors occurred during processing of the alarm. Following are the major fallout scenarios identified:

  • Incoming alarm has an invalid JSON structure or invalid TMF642 structure.
  • No device is found to which the incoming alarm can be mapped.
  • Multiple devices are found for the incoming alarm.
  • Processed alarm could not be forwarded to SIA API.

In the mentioned fallout scenarios, alarm-consumer is configured to retry the processing of the same alarm. This helps to address the possible intermittent issues such as connectivity and temporary data unavailability. In case the retry processing of the alarm is a fallout scenario, the details of the fallout information along with the alarm information will be stored into database. The persisted fallout alarms can be reviewed manually. In the process of reviewing of the fallout alarm, the reviewer can add modification or correction to the alarm data and send the fallout alarm for reprocessing.

Note:

All fallout alarms are persisted. The alarms that have invalid JSON structure or invalid TMF642 structure will not be persisted. These non-persisted fallout alarms are dropped from the alarm-consumer and alarm details cannot be verified later.

The TARGETSYSTEMID for all alarm fallouts for the alarm consumer will be AlarmConsumer. Therefore, while running a fallout resolution, the target service value will be used as AlarmConsumer.

The alarm consumer fallout alarms can be used for reprocessing, using the fallout resolution APIs:

  • To list all alarm consumer fallout alarms:
    • Method - GET
    • URI - /topology/v2/fallout/events?targetService=AlarmConsumer
  • To find a specific fallout alarm:
    • Method - GET, URI - /topology/v2/fallout/events/eid/<eid value>
  • Update fallout alarm data:
    • Method - PUT
    • URI - /topology/v2/fallout/events/eid/<eid value>
    • Request Body - Includes the fallout event details found using the above mentioned APIs.
  • To send for reprocessing:
    • Method - POST
    • URI - /topology/v2/fallout/events/resubmit?state=PENDING&action=NEW&targetService=AlarmConsumer