6 Adding or Removing Nodes to an Existing Cluster
This chapter provides instructions for adding or removing nodes to and from an existing cluster.
Adding a New Control Plane Node to a Cluster
To add new control node to a cluster, do the following:
- Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
- Configure the host, following the instructions in Setting up Hosts. Don't run the
awx-manage migrate
orawx-manage createsuperuser
. These only need to be run when initially creating the cluster. - Set up the service mesh for the control plane node, by following the instructions in Configuring and Starting the Control Plane Service Mesh.
- Set up the service mesh for the execution plane nodes you want to connect to your new control plane node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
- Set up the hop nodes you want to connect to your new control plane node, by following the instructions in Configuring and Starting the Hop Nodes.
- Provision the node as the control node type, register the node to an appropriate instance group (called a queuename in the command), and establish the peer relationships between the execution, hop, and the control nodes as described in Configuring the Control, Execution, and Hop Nodes.
- Start the control plane node as described in Starting the Control, Execution, and Hop Nodes. Don't run the command to create preloaded data.
- If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.
Adding a New Execution Plane Node to a Cluster
To add a new execution node to a cluster, do the following:
- Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
- Configure the host, following the instructions in Setting up Hosts. Don't run the
awx-manage migrate
orawx-manage createsuperuser
. These only need to be run when initially creating the cluster. - Set up the service mesh for the execution plane node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
- Provision the node as the execution node type, register the node to an appropriate instance group (called a queuename in the command), and establish the peer relationships between the execution node and the control plane nodes or between the execution node and the hop nodes as described in Configuring the Control, Execution, and Hop Nodes.
- Start the execution plane node as described in Starting the Control, Execution, and Hop Nodes. Don't run the command to create preloaded data.
- If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.
Adding a New Hop Node to a Cluster
To add new hop node to a cluster, do the following:
- Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
- Configure the host, following the instructions in Setting up Hosts. Don't run the
awx-manage migrate
orawx-manage createsuperuser
. These only need to be run when initially creating the cluster. - Set up the hop nodes you want to connect to your control plane nodes, by following the instructions in Configuring and Starting the Hop Nodes.
- Set up the execution nodes you want to connect to your new hop node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
- Provision the node as the hop node type, and for any new execution nodes, register the execution
node to the
execution
instance group (called a queuename in the command), and establish the peer relationships between the execution, hop, and the control nodes as described in Configuring the Control, Execution, and Hop Nodes. - Start the hop node and execution nodes as described in Starting the Control, Execution, and Hop Nodes. Don't run the command to create preloaded data.
- If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.
Removing a Node from a Cluster
To remove a node from a cluster, do the following:
- Log on the node you want to remove.
- Stop Oracle Linux Automation Manager on the node.
sudo systemctl disable ol-automation-manager.service --now
- Stop the service mesh.
sudo systemctl disable receptor-awx --now
- Delete the
/etc/tower/SECRET_KEY
file. - Open the
/etc/tower/settings.py
file and remove the database password from DATABASES node or remove any configuration that provides a password for your database, if you are using alternative approaches. Check for passwords in any custom settings files in/etc/tower/conf.d
. - From any control plane node, verify that the node you want to remove no longer shows capacity or
heartbeat information. For example, the following shows the node with IP address
192.0.124.44 has zero capacity and no heartbeat information.
sudo su -l awx -s /bin/bash awx-manage list_instances [controlplane capacity=126] 192.0.119.192 capacity=126 node_type=control version=19.5.1 heartbeat="2022-10-20 06:55:44" 192.0.124.44 capacity=0 node_type=control version=19.5.1 [execution capacity=126] 192.0.114.137 capacity=126 node_type=execution version=19.5.1 heartbeat="2022-10-20 06:56:20"
- Deprovision the instance from the cluster.
awx-manage deprovision_instance --hostname=<IP address or host name>
In the previous example, <IP address or host name> is the host you want to remove from the cluster.
- Check the status of the remaining control and execution plane nodes to verify that the
deprovisioned instance no longer appears. For example, the deprovisioned node with IP
address 192.0.124.44 from the previous example no longer appears:
awx-manage list_instances [controlplane capacity=126] 192.0.119.192 capacity=126 node_type=control version=19.5.1 heartbeat="2022-10-20 06:55:44" [execution capacity=126] 192.0.114.137 capacity=126 node_type=execution version=19.5.1 heartbeat="2022-10-20 06:56:20"
- Exit the awx shell environment.
exit
- If required, remove any
tcp-peer
nodes pointing to the deprovisioning node in the/etc/receptor/receptor.conf
files of the remaining cluster nodes, the restart the nodes.sudo systemctl restart receptor-awx