7 Configuring High Availability Features
WARNING:
Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.
Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.
This chapter describes how to configure the Pacemaker and Corosync technologies to create an HA cluster that delivers continuous access to services running across multiple nodes.
More information and documentation on Pacemaker and Corosync can also be found at https://clusterlabs.org/pacemaker/doc/.
About Oracle Linux High Availability Services
Oracle Linux high availability services comprises several open-source packages, including Corosync and Pacemaker, to provide the tools to achieve high availability for applications and services running on Oracle Linux. You may download Corosync, Pacemaker and the functional sub packages from the Unbreakable Linux Network at https://linux.oracle.com or the Oracle Linux yum server at https://yum.oracle.com.
Corosync is an open source cluster engine that includes an API to implement a number of high availability features, including an availability manager that can restart a process when it fails, a configuration and statistics database and a quorum system that can notify applications when quorum is achieved or lost.
Corosync is installed in conjunction with Pacemaker, an open source high availability cluster resource manager responsible for managing the life-cycle of software deployed on a cluster and for providing high availability services. High availability services are achieved by detecting and recovering from node and resource level failures via the API provided by the cluster engine.
Pacemaker also ships with the Pacemaker Command Shell (pcs) that can be used to access and configure the cluster and its resources. The pcs daemon runs as a service on each node in the cluster, making it possible to synchronize configuration changes across all of the nodes in the cluster.
Oracle provides support for Corosync and Pacemaker used for an active-passive 2-node (1:1) cluster configuration on Oracle Linux 7.3 or higher. Support for clustering services does not imply support for Oracle products clustered using these services.
Oracle also provides Oracle Clusterware for high availability clustering with Oracle Database. You can find more information at https://www.oracle.com/database/technologies/rac/clusterware.html.
Installing Pacemaker and Corosync
      On each node in the cluster, install the pcs
      and pacemaker software packages along with all
      available resource and fence agents from the Oracle Linux yum server or from the
      Unbreakable Linux Network.
    
                  
sudo yum install pcs pacemaker resource-agents fence-agents-all
      If you are running firewalld, you should add
      the high-availability service on each of the
      nodes, so that the service components are able to communicate
      across the network. This step typically enables TCP ports 2224
      (used by the pcs daemon), 3121 (for Pacemaker Remote nodes), 21064
      (for DLM resources); and UDP ports 5405 (for Corosync clustering)
      and 5404 (for Corosync multicast, if this is configured).
    
                  
sudo firewall-cmd --permanent --add-service=high-availability sudo firewall-cmd --add-service=high-availability
      To use the pcs command to configure and manage
      your cluster, a password must be set on each node for the
      hacluster user. It is helpful if the password
      that you set for this user is the same on each node. Use the
      passwd command on each node to set the
      password:
    
                  
sudo passwd hacluster
      To use the pcs command, the
      pcsd service must be running on each of the
      nodes in the cluster. You can set this service to run and to start
      at boot using the following commands:
    
                  
sudo systemctl start pcsd.service sudo systemctl enable pcsd.service
Configuring an Initial Cluster and Service
      In the following example, a cluster is configured across two nodes
      hosted on systems with the resolvable hostnames of
      node1 and node2. Each system
      is installed and configured using the instructions provided in
      Installing Pacemaker and Corosync.
    
                  
      The cluster is configured to run a service,
      Dummy, that is included in the
      resource-agents package that you should have
      installed along with the pacemaker packages. This tool simply
      keeps track of whether it is running or not. We configure
      Pacemaker with an interval parameter that determines how long it
      should wait between checks to determine whether the
      Dummy process has failed.
    
                  
      We manually stop the Dummy process outside of
      the Pacemaker tool to simulate a failure and use this to
      demonstrate how the process is restarted automatically on an
      alternate node.
    
                  
Creating the Cluster
- 
                           
                           Authenticate the pcs cluster configuration tool for the haclusteruser on each node in your configuration. To do this, run the following command on one of the nodes that will form part of the cluster:sudo pcs cluster auth node1node2 -u hacluster Replace node1 and node2 with the resolvable hostnames of the nodes that will form part of the cluster. The tool will prompt you to provide a password for the haclusteruser. You should provide the password that you set for this user when you installed and configured the Pacemaker software on each node.
- 
                           
                           To create the cluster, use the pcs cluster setup command. You must specify a name for the cluster and the resolvable hostnames for each node in the cluster: sudo pcs cluster setup --name pacemaker1 node1 node2 Replace pacemaker1 with an appropriate name for the cluster. Replace node1 and node2 with the resolvable hostnames of the nodes in the cluster. 
- 
                           
                           Start the cluster on all nodes. You can do this manually using the pcs command: sudo pcs cluster start --all You can also do this by starting the pacemaker and corosync services from systemd: sudo systemctl start pacemaker.service sudo systemctl start corosync.service Optionally, you can enable these services to start at boot time, so that if a node reboots it automatically rejoins the cluster: sudo systemctl enable pacemaker.service sudo systemctl enable corosync.service Some users prefer not to enable these services, so that a node failure resulting in a full system reboot can be properly debugged before it rejoins the cluster. 
Setting Cluster Parameters
- 
                           
                           Fencing is an advanced feature that helps protect your data from being corrupted by nodes that may be failing or unavailable. Pacemaker uses the term stonith(shoot the other node in the head) to describe fencing options. Since this configuration depends on particular hardware and a deeper understanding of the fencing process, we recommend disabling the fencing feature for this example.sudo pcs property set stonith-enabled=false Fencing is an important part of setting up a production level HA cluster and is disabled in this example to keep things simple. If you intend to take advantage of stonith, see Fencing Configuration for more information.
- 
                           
                           Since this example is a two-node cluster, you can disable the no-quorum policy, as quorum requires a minimum of three nodes to make any sense. Quorum is only achieved when more than half of the nodes agree on the status of the cluster. In this example, quorum can never be reached, so configure the cluster to ignore the quorum state: sudo pcs property set no-quorum-policy=ignore 
- 
                           
                           Configure a migration policy. In this example we configure the cluster to move the service to a new node after a single failure: sudo pcs resource defaults migration-threshold=1 
Creating a Service and Testing Failover
Creating a service and testing failover
 Services are created and are usually configured to run a resource agent that is
            responsible for starting and stopping processes. Most resource agents are created
            according to the OCF (Open Cluster Framework) specification defined as an extension for
            the Linux Standard Base (LSB). There are many handy resource agents for commonly used
            processes included in the resource-agents packages, including a variety
            of heartbeat agents that track whether commonly used daemons or services are still
            running. 
                     
In this example we set up a service that uses a Dummy resource agent created precisely for the purpose of testing Pacemaker. We use this agent because it requires the least possible configuration and does not make any assumptions about your environment or the types of services that you intend to run with Pacemaker.
- 
                           
                           To add the service as a resource, use the pcs resource create command. Provide a name for the service. In the example below, we use the name dummy_service for this resource: sudo pcs resource create dummy_service ocf:pacemaker:Dummy op monitor interval=120s To invoke the Dummy resource agent, a notation ( ocf:pacemaker:Dummy) is used to specify that it conforms to the OCF standard, that it runs in the pacemaker namespace and that the Dummy script should be used. If you were configuring a heartbeat monitor service for an Oracle Database, you might use theocf:heartbeat:oracleresource agent.The resource is configured to use the monitor operation in the agent and an interval is set to check the health of the service. In this example we set the interval to 120s to give the service some time to fail while you are demonstrating failover. By default, this is usually set to 20 seconds, but may be modified depending on the type of service and your own environment. 
- 
                           
                           As soon as you create a service, the cluster attempts to start the resource on a node using the resource agent's start command. You can see the resource start and run status by running the pcs status command: sudo pcs status Cluster name: pacemaker1 Stack: corosync Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum Last updated: Wed Jan 17 06:35:18 2018 Last change: Wed Jan 17 03:08:00 2018 by root via cibadmin on node1 2 nodes configured 1 resource configured Online: [ node2 node1 ] Full list of resources: dummy_service (ocf::pacemaker:Dummy): Started node2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled 
- 
                           
                           Simulate service failure by force stopping the service directly, using crm_resource, so that the cluster is unaware that the service has been manually stopped. sudo crm_resource --resource dummy_service --force-stop 
- 
                           
                           Run crm_mon in interactive mode so that you can wait until you see the node fail and a Failed Actionsmessage is displayed. You should see the service restart on the alternate node.sudo crm_mon Stack: corosync Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum Last updated: Wed Jan 17 06:41:04 2018 Last change: Wed Jan 17 06:39:02 2018 by root via cibadmin on node1 2 nodes configured 1 resource configured Online: [ node2 node1 ] Active resources: dummy_service (ocf::pacemaker:Dummy): Started node1 Failed Actions: * dummy_service_monitor_120000 on node2 'not running' (7): call=16, status=complete, exitreason='none', last-rc-change='Wed Jan 17 06:41:02 2018', queued=0ms, exec=0msYou can use the Ctrl-Ckey combination to exit out of crm_mon at any point.
- 
                           
                           You can try to reboot the node where the service is running to see that failover also occurs in the case of node failure. Note that if you have not enabled the corosync and pacemaker services to start on boot, you may need to start the service on the node that you have rebooted, manually. For example: sudo pcs cluster start node1 
Fencing Configuration
      Fencing or stonith is used to protect data when
      nodes become unresponsive. If a node fails to respond, it may
      still be accessing data. To be sure that your data is safe, you
      can use fencing to prevent a live node from having access to the
      data until the original node is truly offline. To do this, you
      must configure a device that can ensure that a node is taken
      offline. There are a number of fencing agents available that can
      be configured for this purpose. In general,
      stonith relies on particular hardware and
      service protocols that can force reboot or shutdown nodes
      physically to protect the cluster.
    
                  
In this section, different configurations using some of the available fencing agents are presented as examples. Note that these examples make certain presumptions about hardware and assume that you are already aware of how to set up, configure and use the hardware concerned. The examples are provided for basic guidance and it is recommended that you also refer to upstream documentation to familiarize yourself with some of the concepts presented here.
      Before proceeding with any of these example configurations, you
      must ensure that stonith is enabled for your
      cluster configuration:
    
                  
sudo pcs property set stonith-enabled=true
      After you have configured stonith, you can
      check your configuration to ensure that it is set up correctly by
      running the following commands:
    
                  
sudo pcs stonith show --full sudo pcs cluster verify -V
      To check the status of your stonith
      configuration, run:
    
                  
sudo pcs stonith
To check the status of your cluster, run:
sudo pcs status
IPMI LAN Fencing
        Intelligent Platform Management Interface (IPMI) is an interface
        to a subsystem that provides management features of the host
        system's hardware and firmware and includes facilities to power
        cycle a system over a dedicated network without any requirement
        to access the system's operating system. The
        fence_ipmilan fencing agent can be configured
        for the cluster so that stonith can be
        achieved across the IPMI LAN.
      
                     
        If your systems are configured for IPMI, you can run the
        following commands on one of the nodes in the cluster to enable
        the ipmilan fencing agent and to configure
        stonith for both nodes:
      
                     
sudo pcs stonith create ipmilan_n1_fencing fence_ipmilan pcmk_host_list=node1 delay=5 \ ipaddr=203.0.113.1 login=root passwd=password lanplus=1 op monitor interval=60s sudo pcs stonith create ipmilan_n2_fencing fence_ipmilan pcmk_host_list=node2 \ ipaddr=203.0.113.2 login=root passwd=password lanplus=1 op monitor interval=60s
In the above example, the host named node1 has an IPMI LAN interface configured on the IP 203.0.113.1. The host named node2 has an IPMI LAN interface configured on the IP 203.0.113.2. The root user password for the IPMI login on both systems is specified here as password. In each instance, you should replace these configuration variables with the appropriate information to match your own environment.
Note that the delay option should only be set to one node. This helps to ensure that in the rare case of a fence race condition only one node is killed and the other continues to run. Without this option set, it is possible that both nodes believe they are the only surviving node and simultaneously reset each other.
Attention:
The IPMI LAN agent exposes the login credentials of the IPMI subsystem in plain text. Your security policy should ensure that it is acceptable for users with access to the Pacemaker configuration and tools to also have access to these credentials and the underlying subsystems concerned.
SCSI Fencing
        The SCSI Fencing agent is used to provide storage level fencing.
        This protects storage resources from being written to by two
        nodes at the same time, using SCSI-3 PR (Persistent
        Reservation). Used in conjunction with a watchdog service, a
        node can be reset automatically via stonith
        when it attempts to access the SCSI resource without a
        reservation.
      
                     
        To configure an environment in this way, install the watchdog
        service on both nodes and copy the provided
        fence_scsi_check script to the watchdog
        configuration before enabling the service:
      
                     
sudo yum install watchdog sudo cp /usr/share/cluster/fence_scsi_check /etc/watchdog.d/ sudo systemctl enable --now watchdog
        To use this fencing agent, you must also enable the iscsid
        service provided in the iscsi-initiator-utils
        package on both nodes:
      
                     
sudo yum install -y iscsi-initiator-utils sudo systemctl enable --now iscsid
        Once both nodes are configured with the watchdog service and the
        iscsid service, you can configure the
        fence_scsi fencing agent on one of the
        cluster nodes to monitor a shared storage device, such as an
        iSCSI target. For example:
      
                     
sudo pcs stonith create scsi_fencing fence_scsi pcmk_host_list="node1 node2" \ devices="/dev/sdb" meta provides="unfencing"
In the example, node1 and node2 are the hostnames of the nodes in the cluster and /dev/sdb is the shared storage device. You should replace these variables with the appropriate information to match your own environment.
SBD Fencing
        Storage Based Death (SBD) is a daemon that can run on a system
        and monitor shared storage and that can use a messaging system
        to track cluster health. SBD can trigger a reset in the event
        that the appropriate fencing agent determines that
        stonith should be implemented.
      
                     
To set up and configure SBD fencing, stop the cluster by running the following command on one of the nodes:
sudo pcs cluster stop --all
On each node, install and configure the SBD daemon:
sudo yum install sbd
        Edit /etc/sysconfig/sbd to set the
        SBD_DEVICE parameter to identify the shared
        storage device. For example, if your shared storage device is
        available on /dev/sdc, edit the file
        to contain the line:
      
                     
SBD_DEVICE="/dev/sdc"Enable the SBD service in systemd:
sudo systemctl enable --now sbd
On one of the nodes, create the SBD messaging layout on the shared storage device and confirm that it is in place. For example, to set up and verify messaging on the shared storage device at /dev/sdc, run the following commands:
sudo sbd -d /dev/sdc create sudo sbd -d /dev/sdc list
Finally, start the cluster and configure the fence_sbd fencing agent for the shared storage device. For example, to configure the shared storage device, /dev/sdc, run the following commands on one of the nodes:
sudo pcs cluster start --all
sudo pcs stonith create sbd_fencing fence_sbd devices=/dev/sdcIF-MIB Fencing
        IF-MIB fencing takes advantage of SNMP to access the IF-MIB on
        an Ethernet network switch and to shutdown the port on the
        switch to effectively take a host offline. This leaves the host
        running, but disconnects it from the network. It is worth
        bearing in mind that any FibreChannel or InfiniBand connections
        could remain intact, even after the Ethernet connection has been
        terminated, which could mean that data made available on these
        connections could still be at risk. As a result, it is best to
        configure this as a fallback fencing mechanism. See
        Configuring Fencing Levels for more
        information on how to use multiple fencing agents together to
        maximise stonith success.
      
                     
To configure IF-MIB fencing, ensure that your switch is configured for SNMP v2c at minimum and that SNMP SET messages are enabled. For example, on an Oracle Switch, via the ILOM CLI, you could run:
sudo set /SP/services/snmp/ sets=enabled sudo set /SP/services/snmp/ v2c=enabled
 On one of the nodes in your cluster, configure the fence_ifmib fencing
      agent for each node in your environment. For example: 
                     
sudo pcs stonith create ifmib_n1_fencing fence_ifmib pcmk_host_list=node1 \ ipaddr=203.0.113.10 community=private port=1 delay=5 op monitor interval=60s sudo pcs stonith create ifmib_n2_fencing fence_ifmib pcmk_host_list=node2 \ ipaddr=203.0.113.10 community=private port=2 op monitor interval=60s
In the above example, the switch SNMP IF-MIB is accessible at the IP address 203.0.113.10. The host node1 is connected to port 1 on the switch. The host node2 is connected to port 2 on the switch. You should replace these variables with the appropriate information to match your own environment.
Configuring Fencing Levels
If you have configured multiple fencing agents, you may want to set different fencing levels. Fencing levels allow you to prioritize different approaches to fencing and can provide a valuable mechanism to provide fallback options should a default fencing approach fail.
Each fencing level is attempted in ascending order starting from level 1. If the fencing agent configured for a particular level fails, the fencing agent from the next level is attempted instead.
For example, you may wish to configure IPMI-LAN fencing at level 1, but fallback to IF-MIB fencing as a level 2 option. Using the example configurations from IPMI LAN Fencing and IF-MIB Fencing, you could run the following commands on one of the nodes to set the fencing levels for each configured agent:
sudo pcs stonith level add 1 node1 ipmilan_n1_fencing sudo pcs stonith level add 1 node2 ipmilan_n2_fencing sudo pcs stonith level add 2 node1 ifmib_n1_fencing sudo pcs stonith level add 2 node2 ifmib_n2_fencing