5 Installing Oracle Linux Automation Manager in a Clustered Deployment
This chapter discusses how to prepare hosts in an Oracle Linux Automation Manager multihost deployment. When you prepare the hosts, you must install the Oracle Linux Automation Manager software packages and configure them as part of the Oracle Linux Automation Manager service mesh. Configure and start the Service Mesh nodes before configuring and starting the control plane and execution plane nodes.
Configuring and Starting the Control Plane Service Mesh
/etc/receptor/receptor.conf
file. This file contains the following
elements:
- node ID: The node ID must be the IP address or host name of the host.
- log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
- tcp-listener port: This is the port that the node listens for incoming tcp peer
connections configured on other nodes. For example, if the node ID represents a
control node that listens on port 27199, then all other nodes that want to
establish a connection to this control node would require that port 27199 is
specified in the tcp-peer element in the
/etc/receptor/receptor.conf
file. - control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
- work-command: This element defines the type of work that can be done on a node. For control plane nodes, the work type is always Local. The command it runs is the Ansible Runner tool which provides an abstraction layer for running Ansible and Ansible playbook tasks and can be configured to send status and event data to other systems. For more information about Ansible Runner, see https://ansible-runner.readthedocs.io/en/stable/.
On each host intended for use as a control plane node, do the following:
-
Remove any default configuration for Receptor and edit
/etc/receptor/receptor.conf
to contain the following configuration control plane specific information:--- - node: id: <IP address or host name> - log-level: info - tcp-listener: port: <port_number> - control-service: service: control filename: /var/run/receptor/receptor.sock - work-command: worktype: local command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: false
In the previous example, IP address or hostname is the IP address or hostname of the node and port_number is the port number that this node is listening on. For example, you can use something such as control1-192.0.121.28 where you provide a node name and the IP address of the node. And you could configure the tcp-listener list listen on port 27199. The worktype parameter must be
local
in control plane nodes. - Start the Oracle Linux Automation Manager mesh service.
sudo systemctl start receptor-awx
- Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.
Note:
At this point in the process, the peer relationships between service mesh nodes haven't been established yet. Status information only exists for the individual servers running the Service Mesh.
Configuring and Starting the Execution Plane Service Mesh
/etc/receptor/receptor.conf
file. This file contains the following
elements:
- node ID: The node ID must be the IP address or hostname of the host.
- log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
- tcp-listener port: This is the port that the node listens for incoming tcp peer
connections configured on other nodes. For example, if the node ID represents an
execution node that listens on port 27199, then all other nodes that want to
establish a connection to this execution node would require that port 27199 is
specified in the tcp-peer element of the
/etc/receptor/receptor.conf
file. - tcp-peer port: This element must include the hostname and port number of the host it's
connecting with. For example, if this execution node needs to connect to more
than one control plane node to provide redundancy, you would need to add
tcp-peer elements for each control plane node that the execution node connects
with. In the address field, enter the host name or IP address of the control
plane node, followed by the port number. The redial element, if enabled, tries
to periodically reestablish a connection to the host if connectivity fails.
You can also configure tcp-peer elements to include the hostnames and port numbers of other execution nodes or hop nodes based on the service mesh topology requirements.
- control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
- work-command: This element defines the type of work that can be done on a node. For
execution plane nodes, the work type is always
ansible-runner
. The command it runs is the Ansible Runner tool which provides an abstraction layer for running Ansible and Ansible playbook tasks and can be configured to send status and event data to other systems. For more information about Ansible Runner, see https://ansible-runner.readthedocs.io/en/stable/.
On each host intended for use as an execution plane node, do the following:
-
Remove any default configuration for Receptor and edit
/etc/receptor/receptor.conf
to contain the following configuration execution plane specific information:--- - node: id: <IP address or hostname> - log-level: debug - tcp-listener: port: <port_number> - tcp-peer: address: <hostname or IP address>:<target_port_number> redial: true - tcp-peer: address: <hostname or IP address>:<target_port_number> redial: true - control-service: service: control filename: /var/run/receptor/receptor.sock - work-command: worktype: ansible-runner command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: false
In the previous example,
- IP address or hostname is the IP address or hostname of the node.
- port_number is the port number that this node is listening on.
- target_port is the port number of the peer node that you're configuring this node to connect with.
- hostname or IP address is the hostname or IP address of the execution, control, or hop node being connected with.
- The worktype parameter must be
ansible-runner
in execution plane nodes.
If the execution environment is associated with more than one control, execution, or hop node, enter other
- tcp-peer:
nodes for instances that the execution host is associated with. - Start the Oracle Linux Automation Manager mesh service.
sudo systemctl start receptor-awx
- Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.
Note:
At this point in the process, the peer relationships between service mesh nodes haven't been established yet. Status information only exists for the individual servers running the Service Mesh.
Configuring and Starting the Hop Nodes
/etc/receptor/receptor.conf
file. This file contains the following
elements:
- node ID: The node ID must be the IP address or hostname of the host.
- log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
- tcp-listener port: This is the port that the node listens for incoming tcp peer
connections configured on other nodes. For example, if the node ID represents an
execution node that listens on port 27199, then all other nodes that want to
establish a connection to this execution node would require that port 27199 is
specified in the tcp-peer element in the
/etc/receptor/receptor.conf
file. - tcp-peer port: This element must include the hostname and port number of the host it is connecting with. For example, you might configure a hop node to connect to a control node as the intermediate node between the control node and an execution node. In the address field, enter the host name or IP address of the control plane node, followed by the port number. The redial element, if enabled, tries to periodically reestablish a connection to the host if connectivity fails.
- control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
- work-command: This element defines the type of work that can be done on a node. Hop nodes
don't run playbooks. However, you must configure the default fields. The work
type for hop nodes is always
ansible-runner
.
On each host intended for use as a hop node, do the following:
-
Remove any default configuration for Receptor and edit
/etc/receptor/receptor.conf
to contain the following configuration with hop node specific information:--- - node: id: <node IP address or hostname> - log-level: debug - tcp-listener: port: <port_number> - tcp-peer: address: <control hostname or IP address>:<target_port_number> redial: true - tcp-peer: address: <control hostname or IP address>:<target_port_number> redial: true - control-service: service: control filename: /var/run/receptor/receptor.sock - work-command: worktype: ansible-runner command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: false
In the previous example,
- node IP address or hostname is the IP address or hostname of the node.
- port_number is the port number that this node is listening on.
- target_port is the port number of the peer node that you're configuring this node to connect with.
- control hostname or IP address is the hostname or IP address of the control nodes that the hop node is connecting with.
If the hop node is associated with more than one control node, enter other
- tcp-peer:
nodes for each instance that the hop node is associated with. - Start the Oracle Linux Automation Manager mesh service.
sudo systemctl start receptor-awx
- Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.
Note:
At this point in the process, the peer relationships between service mesh nodes haven't been established yet. Status information only exists for the individual servers running the Service Mesh.
Configuring the Control, Execution, and Hop Nodes
To configure the control plane, execution plane, and hop nodes, on one control plane host do the following steps which applies to all Oracle Linux Automation Manager instances:
- Run the following
commands:
sudo su -l awx -s /bin/bash
-
Do the following:
-
Repeat the following command for each host you want to choose as control node type, changing the IP address or host name each time you run the command:
awx-manage provision_instance --hostname=<control hostname or IP address> --node_type=control
In the previous example, control hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. The host name or IP address must match with the host name or IP addressed used when you configured the
/etc/receptor/receptor.conf
file node ID (see Configuring and Starting the Control Plane Service Mesh). If hostname is used, the host must be resolvable. -
Repeat the following command for each host you want to choose as execution node type, changing the IP address or host name each time you run the command:
awx-manage provision_instance --hostname=<execution hostname or IP address> --node_type=execution
In the previous example, execution hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. The host name or IP address must match with the host name or IP addressed used when you configured the
file node ID (see Configuring and Starting the Execution Plane Service Mesh). If hostname is used, the host must be resolvable./etc/receptor/receptor.conf
-
Repeat the following command for each host you want to choose as the hop node type, changing the IP address or host name each time you run the command:
awx-manage provision_instance --hostname=<hop hostname or IP address> --node_type=hop
In the previous example, hop hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. The host name or IP address must match with the host name or IP addressed used when you configured the
file node ID (see Configuring and Starting the Hop Nodes). If hostname is used, the host must be resolvable./etc/receptor/receptor.conf
-
- Run the following command to register the default execution environments, which
are:
- Control Plane Execution Environment
- OLAM EE: (2.2)
awx-manage register_default_execution_environments
-
Run the following command to create the controlplane instance groups (specified as a queue in the command) and associate it to a control plane host. Repeat the command with the same queue name for each control plane host in the cluster:
awx-manage register_queue --queuename=controlplane --hostnames=<control hostname or IP address>
-
Run the following command to create instance groups and associate it to an execution plane host. Repeat the command with the same queue name for each execution plane host in your cluster:
awx-manage register_queue --queuename=execution --hostnames=<execution hostname or IP address>
- Run the
awx-manage list_instances
command to ensure each host you registered are available under the correct instance group. For example, the following shows the IP addresses of two control plane and three execution plane nodes running under the controlplane and execution instance groups. The nodes aren't running yet, and therefore don't show available capacity or heartbeat information.awx-manage list_instances [controlplane capacity=0] 192.0.119.192 capacity=0 node_type=control version=? 192.0.124.44 capacity=0 node_type=control version=? [execution capacity=0] 192.0.114.137 capacity=0 node_type=execution version=ansible-runner-??? 192.0.117.98 capacity=0 node_type=execution version=ansible-runner-??? 192.0.125.241 capacity=0 node_type=execution version=ansible-runner-??? [ungrouped capacity=0] 192.0.123.77 node_type=hop version=ansible-runner-???
- Run the following command to register the Oracle Linux Automation Manager service mesh
peer relationship between each node in the
cluster:
awx-manage register_peers <execution or hop hostname or IP address> --peers <execution, hop, or control hostname or IP address>
This command must be run for each pair of nodes to requiring a peer relationship. For example, the peer relationships being established in the example described in Service Mesh Topology Examples shows the command being run twice for each execution node so that each execution node is connected to a different control node. This ensures that each execution node always has a backup control node if one of the control nodes were to fail.
Other topologies are possible, such as those where an isolated execution node must peer to a hop node, and the hop node must peer to a control node. In this case the command must be run one time to peer the execution node with the hop node, and again to peer the hop node with the control node.
-
Exit the awx shell environment.
exit
- For each control and execution plane host, create a custom settings file in
/etc/tower/conf.d/filename.py
and include the following:DEFAULT_EXECUTION_QUEUE_NAME = 'execution' DEFAULT_CONTROL_PLANE_QUEUE_NAME = 'controlplane'
Starting the Control, Execution, and Hop Nodes
To start the control, execution, and hop nodes, do the following:
-
Start the service on each node:
sudo systemctl enable --now ol-automation-manager.service
-
On one control plane node, run the following command to preload data, such as:
- Demo Project
- Default Galaxy Credentials
- Demo Organization
- Demo Inventory
- Demo Job template
- And so on
sudo su -l awx -s /bin/bash awx-manage create_preload_data
Note:
You only need to run this command one time because the preloaded data persists in the database that all cluster nodes connect with. - Run the
awx-manage list_instances
command to ensure that the control and execution plane nodes are now running and show available capacity and display heartbeat information. For example, the following shows all control and execution plane instances running, with available capacity, and active heartbeat information.awx-manage list_instances [controlplane capacity=270] 192.0.119.192 capacity=135 node_type=control version=19.5.1 heartbeat="2022-09-22 14:38:29" 192.0.124.44 capacity=135 node_type=control version=19.5.1 heartbeat="2022-09-22 14:39:09" [execution capacity=405] 192.0.114.137 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:07" 192.0.117.98 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:35" 192.0.125.241 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:55" [ungrouped capacity=0] 192.0.123.77 node_type=hop heartbeat="2024-09-20 13:26:44"
-
Exit the awx shell environment.
exit
Configuring TLS Verification and Signed Work Requests
We recommend that you secure the Service Mesh communication within the cluster with TLS verification and signed work requests sent between cluster nodes. TLS verification ensures secure communication in the Service Mesh network and signed work requests ensure secure job execution.
- On each host in the cluster (each execution, hop, and control plane nodes), create a file for
TLS settings. For example,
/etc/tower/conf.d/tls.py
. - To enable signed work add the following text to
/etc/tower/conf.d/tls.py
:RECEPTOR_NO_SIG = False
- Set the ownership and permissions for the custom settings file:
sudo chown awx:awx /etc/tower/conf.d/tls.py sudo chmod 0640 /etc/tower/conf.d/tls.py
- From one of the control nodes, in the
/etc/tower
folder, run the following:sudo mkdir -p certs sudo receptor --cert-init commonname="test CA" bits=2048 outcert=certs/ca.crt outkey=certs/ca.key
- Do the following for each node in the cluster:
- If you're using IP addresses for the
node_id
field, run the following commands to create the certs folder and generate TLS certificates:node=<node_id>; sudo receptor --cert-makereq bits=2048 commonname="$node test cert" ipaddress=$node nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key node=<node_id>; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt
In the previous example, node_id is the IP address of the node you're creating keys for that you set in the
/etc/receptor/receptor.conf
file for the execution, hop, or control plane nodes. - If you're using a host name for the
node_id
field, run the following commands to create the certs folder and generate TLS certificates:node=<node_id>; sudo receptor --cert-makereq bits=2048 commonname="$node test cert" dnsname=$node nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key node=<node_id>; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt
In the previous example, node_id is the host name of the node you are creating the keys for that you set in the
/etc/receptor/receptor.conf
file for the execution, hop, or control plane nodes.
- If you're using IP addresses for the
- After the second command, type
yes
to confirm that you want to sign the certificate.For example, the following generates certificates for a cluster with two hosts:node=192.0.250.40; sudo receptor --cert-makereq bits=2048 commonname="$node test cert" ipaddress=192.0.250.40 nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key node=192.0.250.40; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt Requested certificate: Subject: CN=192.0.250.40 test cert Encryption Algorithm: RSA (2048 bits) Signature Algorithm: SHA256-RSA Names: IP Address: 192.0.250.40 Receptor Node ID: 192.0.250.40 Sign certificate (yes/no)? yes node=192.0.251.206; sudo receptor --cert-makereq bits=2048 commonname="$node test cert" ipaddress=192.0.251.206 nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key node=192.0.251.206; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt Requested certificate: Subject: CN=192.0.251.206 test cert Encryption Algorithm: RSA (2048 bits) Signature Algorithm: SHA256-RSA Names: IP Address: 192.0.251.206 Receptor Node ID: 192.0.251.206 Sign certificate (yes/no)? yes
- From the
/etc/tower/certs
folder, run the following commands to generate certificates for work request signing and verification:sudo openssl genrsa -out signworkprivate.pem 2048 sudo openssl rsa -in signworkprivate.pem -pubout -out signworkpublic.pem
- From the cd
/etc/tower/
folder, run the following command to change the certs folder ownership and all files within the folder:sudo chown -R awx:awx certs
- Check that you have all the files you need in the
/etc/tower/certs
folder. For example, the following shows the generated key information for a four node cluster.ls -al total 68 drwxr-xr-x. 2 awx awx 4096 Sep 12 18:26 . drwxr-xr-x. 4 awx awx 132 Sep 12 16:49 .. -rw-------. 1 awx awx 1180 Sep 12 18:19 192.0.113.178.crt -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.113.178.csr -rw-------. 1 awx awx 1679 Sep 12 18:19 192.0.113.178.key -rw-------. 1 awx awx 1176 Sep 12 18:20 192.0.121.28.crt -rw-------. 1 awx awx 1001 Sep 12 18:20 192.0.121.28.csr -rw-------. 1 awx awx 1675 Sep 12 18:20 192.0.121.28.key -rw-------. 1 awx awx 1180 Sep 12 18:20 192.0.126.172.crt -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.126.172.csr -rw-------. 1 awx awx 1679 Sep 12 18:19 192.0.126.172.key -rw-------. 1 awx awx 1176 Sep 12 18:19 192.0.127.70.crt -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.127.70.csr -rw-------. 1 awx awx 1675 Sep 12 18:19 192.0.127.70.key -rw-------. 1 awx awx 1107 Sep 12 16:54 ca.crt -rw-------. 1 awx awx 1679 Sep 12 16:54 ca.key -rw-------. 1 awx awx 1675 Sep 12 18:26 signworkprivate.pem -rw-r--r--. 1 awx awx 451 Sep 12 18:26 signworkpublic.pem
- On each node in the cluster, in the
/etc/tower
folder, create a certs folder and change the ownership and group of the certs folder toawx:awx
:sudo mkdir -p certs sudo chown -R awx:awx certs
- Copy over the ca.crt, node specific .crt, csr, and key files, and the signworkprivate.pem, and signworkpublic.pem files to each node in the cluster.
- For each control plane node, add the following lines to the
/etc/receptor/receptor.conf
file:--- - node: id: <IP address or host name> - log-level: debug # Add the tls: control that specifies the tls-server name for the listener - tcp-listener: port: 27199 tls: controller # Add the TLS server configuration - tls-server: name: controller cert: /etc/tower/certs/<IP address or host name>.crt key: /etc/tower/certs/<IP address or host name>.key requireclientcert: true clientcas: /etc/tower/certs/ca.crt - control-service: service: control filename: /var/run/receptor/receptor.sock # Add the work-signing and work-verification elements - work-signing: privatekey: /etc/tower/certs/signworkprivate.pem tokenexpiration: 30m - work-verification: publickey: /etc/tower/certs/signworkpublic.pem # Set verifysignature to true. - work-command: worktype: local command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: true
In the previous example, IP address or host name is the host name or IP address of the control plane host. If host name is used, the host must be resolvable.
- For each execution plane node, add the following lines to the
/etc/receptor/receptor.conf
file:--- - node: id: <execution IP address or host name> - log-level: debug - tcp-listener: port: 27199 # Add tls: client that specifies the tls-client name. - tcp-peer: address: <hostname or IP address>:27199 redial: true tls: client - tcp-peer: address: <hostname or IP address>:27199 redial: true tls: client # Add the tls-client element. - tls-client: name: client rootcas: /etc/tower/certs/ca.crt insecureskipverify: false cert: /etc/tower/certs/<execution IP address or host name>.crt key: /etc/tower/certs/<execution IP address or host name>.key - control-service: service: control filename: /var/run/receptor/receptor.sock # Add the work-verification element. - work-verification: publickey: /etc/tower/certs/signworkpublic.pem # Set verifysignature to true. - work-command: worktype: ansible-runner command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: true
In the previous example,
- execution IP address or host name is the IP address or host name of the node
- hostname or IP address is the host name or IP address of the execution, control, or hop node you're peering with.
- (If required) For each hop node, add the following lines to the
/etc/receptor/receptor.conf
file:--- - node: id: <node IP address or hostname> - log-level: debug # Add the tls: control that specifies the tls-server name for the listener - tcp-listener: port: 27199 tls: controller # Add tls: client that specifies the tls-client name. - tcp-peer: address: <control hostname or IP address>:27199 redial: true tls: client # Add the tls-client element. - tls-client: name: client rootcas: /etc/tower/certs/ca.crt insecureskipverify: false cert: /etc/tower/certs/<node IP address or hostname>.crt key: /etc/tower/certs/<node IP address or hostname>.key - work-verification: publickey: /etc/tower/certs/signworkpublic.pem # Add the work-signing and work-verification elements - work-signing: privatekey: /etc/tower/certs/signworkprivate.pem tokenexpiration: 30m # Add the TLS server configuration - tls-server: name: controller cert: /etc/tower/certs/<node IP address or hostname>.crt key: /etc/tower/certs/<node IP address or hostname>.key requireclientcert: true clientcas: /etc/tower/certs/ca.crt - control-service: service: control filename: /var/run/receptor/receptor.sock # Set verifysignature to true. - work-command: worktype: local command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: true
In the previous example,
- node IP address or host name is the IP address or host name of the node
- control hostname or IP address is the host name or IP address of the control plane host you're peering with.
- On each node, restart the Service Mesh and Oracle Linux Automation Manager.
sudo systemctl daemon-reload sudo systemctl restart receptor-awx sudo systemctl restart ol-automation-manager
- Verify the Service Mesh. See Viewing the Service Mesh for more information.