Understanding Observability Analytics

Learn about how to install, maintain, and interact with Oracle Communications Unified Assurance Observability Analytics. This document describes architecture, integration with Unified Assurance, installation, scaling, backup and restore strategy, and supported automations for Observability Analytics. It is intended for trained Unified Assurance administrators and consultants to plan, run, and support an Observability Analytics deployment.

About Observability Analytics

Observability Analytics provides dashboards that help you quickly and easily navigate historical data to determine root causes and document historical events. It also provides a machine learning feedback loop to take data, process it, analyze it, and alert back into the real-time fault engine for event correlation. The various anomaly detectors let you automatically leverage historical data to spot event anomalies in real time and inject them as root cause events into Unified Assurance. Unified Assurance can then detect and suppress noise based on the root cause events.

Observability Analytics provides:

Observability Analytics Architecture

Observability Analytics functionality involves the three architectural layers of Unified Assurance:

  1. Collection layer: Collectors receive faults in real-time from external devices and from anomalies identified by Observability Analytics, based on the real-time events.

  2. Database layer: Faults are stored as events in the Event database and streamed to the Historical database.

  3. Presentation layer: Users interact with the event list, derived from the Event database. They access Observability Analytics dashboards directly or from context menu tools in the event list.

The following figure illustrates the workflow across the architectural layers:

Observability Analytics Architecture Diagram

Description of illustration observability-analytics-architecture-diagram.png

As shown in the figure, the Observability Analytics workflow is:

  1. Devices and systems send faults into the collection servers, where they are processed through rules and normalized.

  2. Events are stored in the real-time Event database.

  3. The MySQL Replication Data Importer service takes data from all new and updated events and new journal entries and copies it to the Historical database.

  4. Anomaly Detection analyzes the historical data and identifies anomalies.

  5. Alerting monitors send anomalies to the Event WebHook Aggregator.

  6. The Event WebHook Aggregator creates new events for the detected anomalies.

  7. Policies in Custom Action Policy Engine (CAPE) process the anomalies to correlate, enrich, and suppress events.

  8. Users interact with the event list in the Unified Assurance UI. They can manually adjust events, for example, to change severity, delete them, acknowledge them, and so on.

  9. Users access historical events in the Observability Analytics dashboards inside the Unified Assurance UI either by opening the dashboards directly, or by selecting context menu tools in the real-time event list.

Key Observability Analytics Components

Observability Analytics is made up of the following key components:

Installing and Configuring Observability Analytics

You install and configure Observability Analytics by running AnalyticsWizard as part of the Unified Assurance installation process. See Overview in Unified Assurance Installation Guide for information about the different Unified Assurance installation configurations.

Post Install Actions

After installing Unified Assurance and running AnalyticsWizard, perform the following steps to enable Observability Analytics:

  1. Enable the Event WebHook Aggregator on the collection servers.

    On redundant systems, enable the primary WebHook Aggregator on the primary collection server and enable a redundant WebHook Aggregator on the redundant collection server.

    See Services in Unified Assurance User's Guide for information about the UI for enabling services.

  2. Configure the Webhook URL in the OpenSearch notification channels with the correct host:

    1. From the Analytics menu, select Events, then Administration, and then Management.

    2. Click Notifications.

    3. Do the following for the assure1_notification and log_watcher channels:

      1. Select the channel.

      2. From the Actions menu, select Edit.

      3. Replace localhost in Webhook URL with the actual Unified Assurance host FQDN.

      4. Click Save.

  3. Enable the default CAPE functionality:

    1. Enable the following default CAPE nodes:

      • EscalateByAnomaly

      • NotifyByAnomaly

      • SuppressByAnomaly

      See Nodes in Unified Assurance User's Guide for information about the UI for enabling CAPE nodes.

      See Unified Assurance Automation Policies for Anomaly Detection for information about what these nodes do.

    2. Enable the following default CAPE policies:

      • AbnormalActivity (runs the EscalateByAnomaly node)

      • FaultStormDips (runs the SuppressByAnomaly node)

      • OperationPerformance (runs the NotifyByAnomaly node)

      See Policies in Unified Assurance User's Guide for information about the UI for enabling CAPE policies.

    3. Enable the Custom Action Policy Engine service.

      See Services in Unified Assurance User's Guide for information about the UI for enabling services.

  4. Start the anomaly detectors applicable to your environment and use cases. When you start anomaly detectors, they begin training the machine learning model on your indexes. You can run them on a range of historical data to train the model initially, and then set them to run continuously.

    To run anomaly detectors on historical data:

    1. From the main navigation menu, select Analytics, then Events, then Home.

    2. From the OpenSearch menu, under OpenSearch Plugins, select Anomaly Detection.

    3. From the list on the left, select Detectors.

    4. Click a detector.

    5. Select the Historical analysis tab.

    6. Click Run historical analysis.

    7. Select a time range. Oracle recommends a range of at least two months.

    8. Click Run historical analysis.

    After the model has trained on the historical data, you can start the detectors to run on live data by clicking Start detector on the Real-time results tab.

    See Anomaly Detection for Observability Analytics for information about the different anomaly detectors.

Scaling Observability Analytics

To prepare and plan for the environment before installation, Oracle recommends running a scale calculator that includes the following considerations:

Backing Up and Restoring Observability Analytics Data

Oracle recommends that you back up all data and configurations regularly. Backups need to be available, secure, and easily used for restoration. You can use OpenSearch snapshots to back up and restore the Observability Analytics data, and you can automate snapshots with OpenSearch Snapshot management.

See Backing Up and Restoring an OpenSearch Database and Snapshots in the OpenSearch documentation for more information.

Anomaly Detection for Observability Analytics

Observability Analytics includes anomaly detectors for the following areas:

To see the anomaly detectors and dashboards:

  1. From the main navigation menu, select Analytics, then Events, then Home.

  2. From the OpenSearch menu, under OpenSearch Plugins, select Anomaly Detection.

  3. Do one of the following:

    • To see the anomaly dashboards, select Dashboard.

      See Step 5: Observing the results in the OpenSearch anomaly detection documentation for information about interacting with anomaly dashboards.

    • To see the list of anomaly detectors, select Detectors.

      You can filter the list by entering text in the Search bar. For example, to see only abnormal activity detectors, enter abnact.

Unified Assurance Automation Policies for Anomaly Detection

Observability Analytics uses alerting monitors to add the anomalies it detects as new real-time events in the Event database. You can use the CAPE service and default nodes and policies to automatically perform encapsulation, correlation, escalation, and notification for the new events.

Observability Analytics uses the following default CAPE nodes: