5.7 KiB
Adding execution nodes to AWX
Stand-alone execution nodes can be added to run alongside the Kubernetes deployment of AWX. These machines will not be a part of the AWX Kubernetes cluster. The control nodes running in the cluster will connect and submit work to these machines via Receptor. The machines be registered in AWX as type "execution" instances, meaning they will only be used to run AWX Jobs (i.e. they will not dispatch work or handle web requests as control nodes do).
Below is an example of a single AWX pod connecting to two different execution nodes. For each execution node, the awx-ee container makes an outbound TCP connection to the machine via Receptor.
AWX POD
┌──────────────┐
│ │
│ ┌──────────┐ │
┌─────────────────┐ │ │ awx-task │ │
│ execution node 1│◄────┐ │ ├──────────┤ │
├─────────────────┤ ├────┼─┤ awx-ee │ │
│ execution node 2│◄────┘ │ ├──────────┤ │
└─────────────────┘ Receptor │ │ awx-web │ │
TCP │ └──────────┘ │
Peers │ │
└──────────────┘
Note, if the AWX deployment is scaled up, the new AWX pod will also make TCP connections to each execution node.
Overview
Adding an execution instance involves a handful of steps:
- Start a machine that is accessible from the k8s cluster (Red Hat family of operating systems are supported)
- Create a new AWX Instance with
hostnamebeing the IP or DNS name of your remote machine. - Download the install bundle for this newly created instance.
- Run the install bundle playbook against your remote machine.
- Wait for the instance to report a Ready state. Now jobs can run on that instance.
Start machine
Bring a machine online with a compatible Red Hat family OS (e.g. RHEL 8 and 9). This machines needs a static IP, or a resolvable DNS hostname that the AWX cluster can access. The machine will also need an available open port to establish inbound TCP connections on (default is 27199).
In general the more CPU cores and memory the machine has, the more jobs that can be scheduled to run on that machine at once. See https://docs.ansible.com/automation-controller/4.2.1/html/userguide/jobs.html#at-capacity-determination-and-job-impact for more information on capacity.
Create instance in AWX
Use the Instance page or api/v2/instances endpoint to add a new instance.
hostname("Name" in UI) is the IP address or DNS name of your machine.node_typeis "execution"node_stateis "installed"listener_portis an open port on the remote machine used to establish inbound TCP connections. Defaults to 27199.
Download the install bundle
On the Instance Details page, click Install Bundle and save the tar.gz file to your local computer and extract contents. Alternatively, make a GET request to api/v2/instances/{id}/install_bundle and save the binary output to a tar.gz file.
Run the install bundle playbook
In order for AWX to make proper TCP connections to the remote machine, a few files need to in place. These include TLS certificates and keys, a certificate authority, and a proper Receptor configuration file. To facilitate that these files will be in the right location on the remote machine, the install bundle includes an install_receptor.yml playbook.
The playbook requires the Receptor collection which can be obtained via
ansible-galaxy collection install -r requirements.yml
Modify inventory.yml. Set the ansible_user and any other ansible variables that may be needed to run playbooks against the remote machine.
ansible-playbook -i inventory.yml install_receptor.yml to start installing Receptor on the remote machine.
Note, the playbook will enable the Copr ansible-awx/receptor repository so that Receptor can be installed.
Wait for instance to be Ready
Wait a few minutes for the periodic AWX task to do a health check against the new instance. The instances endpoint or page should report "Ready" status for the instance. If so, jobs are now ready to run on this machine!
Removing instances
You can remove an instance by clicking "Remove" in the Instances page, or by setting the instance node_state to "deprovisioning" via the API.
Troubleshooting
Fact cache not working
Make sure the system timezone on the execution node matches settings.TIME_ZONE (default is 'UTC') on AWX.
Fact caching relies on comparing modified times of artifact files, and these modified times are not timezone-aware. Therefore, it is critical that the timezones of the execution nodes match AWX's timezone setting.
To set the system timezone to UTC
ln -s /usr/share/zoneinfo/Etc/UTC /etc/localtime
Permission denied errors
Jobs may fail with the following error
"msg":"exec container process `/usr/local/bin/entrypoint`: Permission denied"
or similar
For RHEL based machines, this could due to SELinux that is enabled on the system.
You can pass these extra_settings container options to override SELinux protections.
DEFAULT_CONTAINER_RUN_OPTIONS = ['--network', 'slirp4netns:enable_ipv6=true', '--security-opt', 'label=disable']