High Availability Clustering

With Oracle Linux instances running on Oracle Cloud Infrastructure (OCI), you can create high availability (HA) clusters that deliver continuous access to applications and services running across multiple nodes. HA clustering minimizes downtime and provides continuous service if and when system components fail.

You can create HA clusters with OCI instances by installing and using Pacemaker, an open source high availability resource manager, and Corosync, an open source cluster engine. For more information about HA clustering and the Pacemaker and Corosync technologies, see the following sources:

Prerequisite

Before you begin, configure a shared storage device to be accessible from all nodes that you want in the HA cluster. A shared storage device is needed for cluster service and application messaging, and for cluster SBD fencing. For more information about setting up a shared storage device, see the following sources:

Setting Up High Availability Clustering With OCI Instances

To set up high availability clustering with OCI instances:

  1. Install the Pacemaker software
  2. Create an HA cluster
  3. Configure fencing

Installing Pacemaker

To create a high availability (HA) cluster with Oracle Cloud Infrastructure (OCI) instances, you must first install the Pacemaker and Corosync packages on each instance, or node, that you want in the cluster. You can then configure each cluster node, and ensure that the Pacemaker service automatically starts and runs on each node at boot time.

Note

OCI instance, node, and cluster node are used interchangeably with HA clustering for OCI.

Best Practice

For each OCI instance that you want in the cluster, open a terminal window and connect to the instance.

For example, if you want two OCI instances to be nodes in the cluster, open two terminal windows, and connect to each instance using ssh:
ssh instance-IP-address

Having a terminal window open for each node prevents the need to repeatedly log in and out of the nodes when configuring the HA cluster.

Installing Pacemaker and Corosync

To install the Pacemaker and Corosync packages and configure the HA cluster nodes:

  1. Complete the prerequisite in High Availability Clustering.
  2. Enable the repository on the Oracle Linux yum server where the Pacemaker and Corosync packages reside.

    Oracle Linux 10:
    sudo dnf config-manager --enable ol10_addons

    Oracle Linux 9:

    sudo dnf config-manager --enable ol9_addons

    Oracle Linux 8:

    sudo dnf config-manager --enable ol8_addons
  3. On each node, install the Pacemaker pcs command shell, software packages, and all available resource and fence agents:

    sudo dnf install pcs pacemaker resource-agents fence-agents-sbd
  4. Configure the firewall so that the service components can communicate across the network:

    sudo firewall-cmd --permanent --add-service=high-availability
    sudo firewall-cmd --add-service=high-availability
  5. On each node, set a password for the hacluster user:

    sudo passwd hacluster
    Tip

    Set the same password on each node to avoid authorization issues when running pcs commands on different nodes within the same cluster.

  6. On each node, set the pcsd service to run and start at boot:

    sudo systemctl enable --now pcsd.service
  7. Create an HA cluster using the nodes you have configured. See Creating an HA Cluster.

Creating an HA Cluster

With the Pacemaker and Corosync software, you can create a high availability (HA) cluster with Linux instances running on Oracle Cloud Infrastructure (OCI).

To create an HA cluster:

  1. Install the Pacemaker and Corosync software packages on each node you want in the cluster. See Installing Pacemaker.
  2. From one of the nodes, authenticate the pcs cluster configuration tool for the hacluster user of each cluster node.

    For example, if you want two nodes to make up the HA cluster, run the following command from one of the cluster nodes:

    sudo pcs host auth node1 node2 -u hacluster

    Replace node1 and node2 with the resolvable host names of the nodes that are to form the cluster.

    Alternatively, if the host names are not resolvable, specify the IP address for each node as shown in the following example:

    sudo pcs host auth node1 addr=192.0.2.1 node2 addr=192.0.2.2 -u hacluster

    Replace 192.0.2.1 and 192.0.2.2 with the IP address of each of the respective hosts in the cluster.

  3. When prompted, enter the password that you defined for the hacluster user when you installed and configured the Pacemaker software on each node.

  4. Create the HA cluster by using the pcs cluster setup command, and specifying the following:

    • Name of the cluster
    • The host name and IP address of each node that you want in the cluster

    For example, to create an HA cluster with two nodes:

    sudo pcs cluster setup cluster-name node1 addr=192.0.2.1 node2 addr=192.0.2.2

    Replace 192.0.2.1 and 192.0.2.2 with the IP address of each of the respective hosts in the cluster.

  5. From one of the nodes, start the cluster on all nodes:

    sudo pcs cluster start --all
  6. Optionally, you can enable these services to start at boot time so that if a node reboots, it automatically rejoins the cluster. To do this, run the following command on one of the nodes:

    sudo pcs cluster enable --all
    Note

    Some users prefer not to enable these services so that a node failure resulting in a full system reboot can be properly debugged before it rejoins the cluster.

  7. Configure SBD fencing for the newly created HA cluster. See Configuring Fencing.

Configuring Fencing

STONITH Block Device (SBD) fencing works with the Pacemaker software to protect data when a node in a high availability (HA) cluster becomes unresponsive. Fencing prevents the live nodes in the HA cluster from accessing data on the unresponsive node until the Pacemaker software takes that unresponsive node offline.

SBD fencing configuration is the last step in completing the setup of an HA cluster with OCI instances. For information about creating an HA cluster, see Creating an HA Cluster.

Note

To create HA clusters with OCI instances, you must use only the SBD cluster fencing mechanism. Other cluster fencing mechanisms aren't currently supported in this environment.

Configuring SBD Fencing for an HA Cluster

To configure SBD fencing for an HA cluster:

  1. From one of the cluster nodes, enable stonith (Shoot The Other Node In The Head), a fencing technique that's used as part of the SBD fencing strategy.

    sudo pcs property set stonith-enabled=true
  2. From one of the nodes, stop the cluster:

    sudo pcs cluster stop --all
  3. On each node, install and configure the SBD daemon:

    sudo dnf install sbd
  4. On each node, enable the sbd systemd service:

    sudo systemctl enable sbd
    Note

    When enabled, the sbd systemd service automatically starts and stops as a dependency of the Pacemaker service. This means you don't need to run the sbd service independently, and you can't manually start or stop the service. If you try to manually start or stop it, the state of the service remains the same, and an error message is displayed, indicating that the service is a dependent service.
  5. On each node, edit the /etc/sysconfig/sbd file and set the SBD_DEVICE parameter to identify the shared storage device. Use a persistent device path, such as a link within the /dev/disk/by-id/directory system, to do this.

    For example, if the shared storage device is available on /dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90, ensure the /etc/sysconfig/sbd file on each node contains the following line:

    SBD_DEVICE="/dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90"

    For information about shared storage devices, see the links provided in the Prerequisite section.

  6. Continue to edit the /etc/sysconfig/sbd file on each node by setting the Watchdog device to /dev/null:
    SBD_WATCHDOG_DEV=/dev/null
  7. From one of the nodes, create the SBD messaging layout on the shared storage device, and confirm that it's in place.

    For example, to set up and verify messaging on the shared storage device at /dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90:

    sudo sbd -d /dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90 create 
    sudo sbd -d /dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90 list
  8. From one of the nodes, start the cluster and configure the fence_sbd fencing agent for the shared storage device.

    For example, to start the cluster and configure the shared storage device at /dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90:

    sudo pcs cluster start --all 
    sudo pcs stonith create sbd_fencing fence_sbd devices=/dev/disk/by-id/wwn-0XY3acy9d40afd88083ACR90
  9. To check the stonith configuration has been set up correctly, run the following commands:

    sudo pcs stonith config
    sudo pcs cluster verify --full
  10. To check the status of the stonith configuration, run the following command:

    sudo pcs stonith
  11. To check the status of the cluster, run the following command:

    sudo pcs status