Scalability and Redundancy

Oracle Communications Unified Assurance is a highly scalable platform that can range from a single server to dozens of horizontally scalable, inter-dependent servers. A single instance of Unified Assurance supports your organization's growing needs without requiring you to install and integrate additional instances.

Each tier of the Unified Assurance product is scalable and redundant through several means. This chapter describes each tier and component of the software.

Presentation Tier Scalability and Redundancy

The presentation tier consists of the internal presentation servers and optional external presentation servers for use in secured environments such as demilitarized zones (DMZs).

Internal Presentation Server Scalability and Redundancy

The internal presentation servers encompass the web user interface, the API, rules repository, image repository, package repository, dashboard integrations, and message bus. The first server installed is considered the primary presentation server, and you can install one additional redundant internal presentation server. The internal presentation servers are considered stateful because the Unified Assurance management database is collocated with them.

Redundancy of the internal presentation servers is provided as follows:

The following figure illustrates a redundant deployment for the internal presentation servers using a load balancer.

Presentation Redundancy

Description of illustration presentation-redundancy.png

Users access the internal presentation servers by the web fully qualified domain name (WebFQDN) configuration option. In a single server development instance, the WebFQDN is commonly the same as the HostFQDN, but you can set the WebFQDN to any valid FQDN as a vanity URL. You must be careful when choosing a WebFQDN that is different than the HostFQDN, especially when redundancy is enabled. As a best practice, redundant pairs should share a WebFQDN.

Your internal and external presentation servers can share the same WebFQDN, or they can use a different one. In redundant environments with a VIP, the redundant pair of internal presentation servers should share a WebFQDN, and the redundant pair of external presentation servers should share a WebFQDN, but the internal and external WebFQDNs can be different.

In environments without a VIP, the internal presentation servers can each have a separate WebFQDN, but several components (SVN, API calls, image repository, package repository) can only support a single destination. Because of this, you can only achieve partial presentation server failover with separate WebFQDNs, so changes should be minimized.

External Presentation Server Scalability and Redundancy

If you have multitenant configurations, or you need to scale the presentation layer beyond two servers, you can install one or more external presentation servers. These servers do not contain the stateful Unified Assurance management database and are a good fit to be deployed in a DMZ for public exposure.

Redundancy of the external presentation servers is the same as that for internal presentation servers, without the MySQL multi-source MySQL replication, which only applies to the database.

Database Tier Scalability and Redundancy

The database tier consists of the following databases, each of which supports scalability and redundancy in different ways:

Event Database Scalability and Redundancy

The Event database is stored in the MySQL Relational Database Management System (RDBMS).

Historical Database Scalability and Redundancy

The Historical database is stored in OpenSearch, and includes index templates for Observability Analytics, Flow Analytics, and Log Analytics.

The following figure illustrates a deployment for the Historical database with three-node redundant clusters, receiving event data from the Event databases, and receiving log and flow data from other Unified Assurance servers. Each Historical database node, along with every other server in the environment, would also have redundant instances of Fluentbit running to collect logs and send them to Fluentd instances on both clusters, but these have been omitted from the figure for simplicity.

Historical Database Redundancy

Description of illustration historical-database-redundancy.png

Metric Database Scalability and Redundancy

The Metric database is stored in InfluxDB, a time-series database.

The following figure illustrates a redundant deployment for the Metric database.

Metric Database Redundancy

Description of illustration metric-database-redundancy.png

Graph Database Scalability and Redundancy

The Graph database is stored in Neo4j, a NoSQL document and graph database.

SOA Collection and Processing Tiers Scalability and Redundancy

The service-oriented applications (SOA) on the collection and processing servers are managed by a local instance of the Unified Assurance Broker. The Broker is responsible for making sure enabled services are always running every minute, and starting jobs based on their cron schedules. You can add multiple collection and processing servers to a single Unified Assurance instance to horizontally scale the processing and analytics of data.

About General Broker Failover

Each Broker sends a heartbeat message to the Rabbit MQ message bus broadcasting to all other Brokers every second. Each Broker knows the state of every other Broker, alive or dead. The following considerations determine when to make failover or clustering changes:

About Job Failover

You can configure redundant scheduled jobs for failover. Use the Jobs UI to apply the same application configuration for both jobs. When configuring the redundant job, you choose its corresponding primary job. The primary and redundant jobs must be the same application type and located on separate servers.

The primary job will always be run if the Broker running that job is online. If the redundant Broker for the job cannot reach its primary Broker, the redundant job is run. In a split-brain scenario, where Brokers may still be running but unable to communicate with each other, both jobs will run. How conflicts are handled is up to each application; the default jobs are configured to minimize conflicts.

About Service Failover

You can configure redundant services for failover. Use the Services UI to apply the same application configuration for both services. When configuring the redundant service, you choose its corresponding primary service.

Unlike redundant jobs, both services run at the same time. Each application requests its state from the local Broker and handles failover internally. The active service (usually the primary) will perform all the work necessary, just like a standalone service. The standby service (usually the redundant) will configure itself and hold off on processing data. If the standby service passively receives data, it logs the collection, buffers a small configured amount, and the discards the oldest data. If the standby service actively collects data, it does not perform any action during its poll time. When the redundant Broker for the server cannot reach its primary Broker, the redundant server becomes active, and collection or polling proceeds as normal.

During rolling maintenance or updates, the primary Broker can tell the redundant service to immediately become active. In this case only, the primary Broker stays running while its primary service is stopped.

About Clustering Services

You can cluster services to spread the responsibility for work across multiple applications. You can only cluster services for applications that actively poll data.

Clustered applications can be running anywhere, with several on the same server or separate servers. Use the Services UI to apply the same application configuration for both services. Each service has a cluster ID, which is a numeric value appended to the name of the application binary. You can run multiple clusters of the same application type on separate cluster IDs. Services in a cluster communicate with their local Brokers to determine the current number of applications in the cluster. This number of members determines how the work is divided among members (for example, split processing based on devices). When members join or leave the cluster, the division of work is recalculated for the next poll cycle.

Microservice Collection and Processing Tiers Scalability and Redundancy

Microservices run in the Kubernetes container orchestration platform. You can configure microservices to run as independent pods, in deployments, stateful sets, or daemon sets. The Kubernetes declarative model attempts to run the count and location of microservices if possible based on resources. When multiple servers are added to a cluster, if there is a server outage, failover is automatically handled by moving the microservices to another server if possible. The details are numerous and can be configured in great detail. For more information, see the Kubernetes documentation. To understand more about microservices in Unified Assurance, see Understanding Microservices.