13 Disaster Recovery Support

A minimum of two pods is required for a service to be highly available. They should be on different worker nodes (Kubernetes can schedule the pods on different nodes using pod anti-affinity). If one node goes down, it takes out the corresponding pod, leaving the other pod(s) to handle the requests until the downed pod can be rescheduled. When a worker node goes down, the PODs running on that worker node will be rescheduled on other available worker nodes.

For DB High Availability we can use the Oracle Real Application Clusters (RAC) to run a single Oracle Database across multiple servers in order to maximize availability and enable horizontal scalability.

Disaster Recovery across Data Centers

The disaster recovery when the data center completely goes down is maintained with another passive data center.

Figure 13-1 documents the disaster recovery plan for the data center. A parallel passive data center is maintained, where the runtime data is periodically replicated from the active data center to the passive data center. In the event of any catastrophic failures in the primary (or active) data center, the load must be switched to secondary (or passive) data center. Before switching the load to secondary data center, you should shutdown all the services in the primary data center and start all the services in the secondary data center.

Figure 13-1 Disaster Recovery Plan for Data Center



About Switchover and Failover

The purpose of a geographically redundant deployment is to provide resiliency in the event of a complete loss of service in the primary site, due to a natural disaster or other unrecoverable failure in the primary UIM site. This resiliency is achieved by creating one or more passive standby sites that can take the load when the primary site becomes unavailable. The role reversal from the standby site to the primary site can be accomplished in any of the following ways:

  • Switchover, in which the operator performs a controlled shutdown of the primary site before activating the standby site. This is primarily intended for planned service interruptions in the primary UIM site. Following a switchover, the former primary site becomes the standby site. The site roles of primary site and standby site can be restored by performing a second switchover operation, which is switchback.
  • Failover, in which the primary site becomes unavailable due to unanticipated reasons and cannot be recovered. The operator then transitions the standby site to the primary role. The primary site that is down cannot act as a standby site and will require reconstruction of the database as a standby database before restoring the site roles.

About Kafka Mirror Maker

Kafka's Mirror Maker functionality makes it possible to maintain a replica of an existing Kafka cluster (which is used in Message Bus service). This mirrors a source Kafka cluster into a target (mirror) Kafka cluster. To use this mirror, it is a requirement that the source and target Kafka clusters (that is, Message Bus service) are up and running. If the target Kafka cluster is down or offline, we cannot mirror into the target cluster.

Oracle Data Guard

Oracle Data Guard is responsible for replicating transactions from the Active DB to the Standby DB. It is included as a part of every Oracle DB Enterprise Edition installation.

Note:

When using multi-tenant databases involving CDBs and PDBs with Data Guard, the replication happens at the CDB level. This means all the PDBs from the active CDB will be replicated over to the standby CDB and also, the commands to enable Data Guard must be run at the CDB level.

Installation and Configuration

If ATA is disabled in UIM Cloud Native then it is not required to deploy Message Bus, ATA and Mirror Maker Services in the clusters. These commands are intended to be used as samples. For detailed documentation on deploying UIM, see "Overview of the UIM Cloud Native Deployment" in UIM Cloud Native Deployment Guide.

Setting up the Primary (active) Instance

To set up the primary (active) instance:

  1. Provision Databases one for the primary site and another for the secondary site.
  2. Set up Data Guard between primary site and secondary site. Primary site should be in ACTIVE role. Secondary site should be in STANDBY role. Refer to Oracle 19c Documentation.
  3. Deploy UIM Cloud Native.
    1. Create image pull secrets (if required).
    2. Create UIM secrets for WLS admin, OPSS, WLS RTE, RCU DB and UIM DB.

      Note:

      uimprimary here refers to the Kubernetes namespace where the primary instance will be deployed. Replace this with the desired namespace.
      $UIM_CNTK/scripts/manage-instance-credentials.sh -p uimprimary -i dr create wlsadmin,opssWP,wlsRTE,rcudb,uimdb
    3. Create Weblogic encrypted password.
      $UIM_CNTK/scripts/install-uimdb.sh -p uimprimary -i dr -s $SPEC_PATH -c 8
    4. Create UIM users secrets.
      $UIM_CNTK/samples/credentials/manage-uim-credentials.sh -p uimprimary -i dr -c create -f "/home/spec_dir/users.txt"
    5. Create DB schemas.
      $UIM_CNTK/scripts/install-uimdb.sh -p uimprimary -i dr -s $SPEC_PATH -c 1
      $UIM_CNTK/scripts/install-uimdb.sh -p uimprimary -i dr -s $SPEC_PATH -c 2
    6. Create UIM instance.
      $UIM_CNTK/scripts/create-ingress.sh -p uimprimary -i dr -s $SPEC_PATH
      $UIM_CNTK/scripts/create-instance.sh -p uimprimary -i dr -s $SPEC_PATH
    7. Add UIM user roles.
      $UIM_CNTK/samples/credentials/assign-role.sh -p uimprimary -i dr -f uim-users-roles.txt
  4. Deploy Message Bus.
    $COMMON_CNTK/scripts/create-applications.sh -p uimprimary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a messaging-bus
  5. Deploy ATA:
    1. Create Topology DB secrets:
      $COMMON_CNTK/scripts/manage-app-credentials.sh -p uimprimary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata create database
    2. Create Topology UIM secrets:
      $COMMON_CNTK/scripts/manage-app-credentials.sh -p uimprimary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata create uim
    3. Create DB schemas:
      $COMMON_CNTK/scripts/install-database.sh -p uimprimary -i dr -f $SPEC_PATH/<proejct>/<instance>/database.yaml -a ata -c 1
    4. Deploy Topology:
      $COMMON_CNTK/scripts/create-applications.sh -p uimprimary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata

See "Deploying Unified Operations Message Bus" for deploying Message Bus, "Deploying the Active Topology Automator Service" for deploying ATA.

See "Overview of the UIM Cloud Native Deployment" in UIM Cloud Native Deployment Guide for deploying UIM.

Setting up the Secondary (standby) Instance

To set up the secondary (standby) instance:

  1. Perform switchover operation on active (primary site) DB. Now secondary site DB should be in ACTIVE role and primary site DB should be in PASSIVE role. Refer to Oracle 19c Documentation.
  2. Deploy UIM Cloud Native:
    1. Export OPSS wallet file secret from primary instance and recreate in secondary instance.

      Note:

      Where, uimsecondary refers to the Kubernetes namespace where the secondary instance will be deployed. Replace this with the desired namespace.
      kubectl -n uimprimary get configmap uimprimary-dr-weblogic-domain-introspect-cm -o jsonpath='{.data.ewallet\.p12}' > ./primary_ewallet.p12
      $UIM_CNTK/scripts/manage-instance-credentials.sh -p uimsecondary -i dr create opssWF
    2. (Optional) Create image pull secrets.
    3. Create UIM secrets for WLS admin, OPSS, WLS RTE, RCU DB and UIM DB:
      $UIM_CNTK/scripts/manage-instance-credentials.sh -p uimsecondary -i quick create wlsadmin,opssWP,wlsRTE,rcudb,uimdb
    4. Create Weblogic encrypted password:
      $UIM_CNTK/scripts/install-uimdb.sh -p uimsecondary -i dr -s $SPEC_PATH -c 8
    5. Create UIM users secrets:
      $UIM_CNTK/samples/credentials/manage-uim-credentials.sh -p uimsecondary -i dr -c create -f "/home/spec_dir/users.txt"
    6. Create UIM instance:
      $UIM_CNTK/scripts/create-ingress.sh -p uimsecondary -i dr -s $SPEC_PATH
      $UIM_CNTK/scripts/create-instance.sh -p uimsecondary -i dr -s $SPEC_PATH
    7. Add UIM user roles:
      $UIM_CNTK/samples/credentials/assign-role.sh -p uimsecondary -i dr -f uim-users-roles.txt
  3. Deploy message bus:
    $COMMON_CNTK/scripts/create-applications.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml  -a messaging-bus
  4. Deploy ATA:
    1. Create Topology DB secrets:
      $COMMON_CNTK/scripts/manage-app-credentials.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml  -a ata create database
    2. Create Topology UIM secrets:
      $COMMON_CNTK/scripts/manage-app-credentials.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml  -a ata create uim
    3. Deploy Topology:
      $COMMON_CNTK/scripts/create-applications.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml  -a ata
  5. Deploy Mirror Maker. See "Installing and Configuring Mirror Maker 2.0" for more information.
  6. After the secondary instance has been setup, switchover back to the primary (active) site.

Switchover Sequence

To perform a switchover between site A (active) and site B (standby):

  1. Bring down instances in site A. These include UIM and ATA. Message Bus must be enabled to perform the replication using Mirror Maker.
    #Disable topology
    $COMMON_CNTK/scripts/delete-applications.sh -p uimprimary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata
    #Disable UIM
    $UIM_CNTK/scripts/delete-instance.sh -p uimprimary -i dr -s $SPEC_PATH
  2. Perform switchover on DB. Site B DB will now become Primary. Site B DB will assume Standby role. Refer to Oracle 19c Documentation.
  3. Bring up instances in site B. This includes UIM and ATA. Message Bus should already be active:
    #EnableUIM
    $UIM_CNTK/scripts/create-instance.sh -p uimsecondary -i dr -s $SPEC_PATH
    #Enable topology
    $COMMON_CNTK/scripts/create-applications.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata
  4. Perform DNS switching to route all traffic to site B.

Failover Sequence

In case of any irrecoverable failure in the primary site, perform a failover operation on the standby site. To do so:

  1. Perform failover on DB. Standby (secondary) DB will now become Primary. Primary site DB will assume Deactivated Standby role. Refer to Oracle 19c Documentation.
  2. Bring up instances in standby. This includes UIM and Topology. Message Bus should already be active:
    #EnableUIM
    $UIM_CNTK/scripts/create-instance.sh -p uimsecondary -i dr -s $SPEC_PATH
    #Enable topology
    $COMMON_CNTK/scripts/create-applications.sh -p uimsecondary -i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata
  3. Perform DNS switching to route all traffic to secondary instances.

Once the primary site to restored, establish a synchronization between secondary and primary site. To do so:

  1. Bring up Message Bus and DB in primary site:
    #Enable message bus
    $COMMON_CNTK/scripts/create-applications.sh -p uimprimary-i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a messaging-bus
  2. Setup Kafka Mirror Maker with secondary Message Bus as source and primary Message Bus as target. See "About Kafka Mirror Maker" for more information.
  3. Switch primary DB role from Deactivated Standby Standby. See "Deploying Unified Operations Message Bus" for more information.
As the synchronization between secondary and primary site is established, perform a switchover to the primary site. To do so:
  1. Bring up UIM in primary site:
    $UIM_CNTK/scripts/create-instance.sh -p uimprimary -i dr -s $SPEC_PATH
  2. Bring up Topology in primary site:
    $COMMON_CNTK/scripts/create-applications.sh -p uimprimary-i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata
  3. Perform DNS switching to route all traffic to primary instances.
  4. Bring down instances in secondary site. This includes UIM and Topology. Message Bus should remain active for Kafka Mirror Maker synchronization:
    #Disable topology
    $COMMON_CNTK/scripts/delete-applications.sh -p uimsecondary-i dr -f $SPEC_PATH/<project>/<instance>/applications.yaml -a ata
    #Disable UIM
    $UIM_CNTK/scripts/delete-instance.sh -p uimsecondary -i dr -s $SPEC_PATH