diff --git a/docs/docsite/rst/administration/instances.rst b/docs/docsite/rst/administration/instances.rst index 1f3e815811..daa64d8c42 100644 --- a/docs/docsite/rst/administration/instances.rst +++ b/docs/docsite/rst/administration/instances.rst @@ -1,7 +1,7 @@ .. _ag_instances: Managing Capacity With Instances ----------------------------------- +================================= .. index:: pair: topology;capacity @@ -9,12 +9,32 @@ Managing Capacity With Instances pair: remove;capacity pair: add;capacity -Scaling your mesh is only available on Openshift deployments of AWX and is possible through adding or removing nodes from your cluster dynamically, through the **Instances** resource of the AWX User Interface, without running the installation script. +Scaling your mesh is only available on Openshift and Kubernetes (K8S) deployments of AWX and is possible through adding or removing nodes from your cluster dynamically, through the **Instances** resource of the AWX User Interface, without running the installation script. + +Instances serve as nodes in your mesh topology. Automation mesh allows you to extend the footprint of your automation. Where you launch a job and where the ``ansible-playbook`` runs can be in different locations. + +.. image:: ../common/images/instances_mesh_concept.png + :alt: Site A pointing to Site B and dotted arrows to two hosts from Site B + +Automation mesh is useful for: + +- traversing difficult network topologies +- bringing execution capabilities (the machine running ``ansible-playbook``) closer to your target hosts + +The nodes (control, hop, and execution instances) are interconnected via receptor, forming a virtual mesh. + +.. image:: ../common/images/instances_mesh_concept_with_nodes.png + :alt: Control node pointing to hop node, which is pointing to two execution nodes. + Prerequisites -~~~~~~~~~~~~~~ +-------------- -- The system that is going to run the ``ansible-playbook`` requires the collection ``ansible.receptor`` to be installed: +- |rhel| (RHEL) or Debian operating system. Bring a machine online with a compatible Red Hat family OS (e.g. RHEL 8 and 9) or Debian 11. This machine requires a static IP, or a resolvable DNS hostname that the AWX cluster can access. If the ``listener_port`` is defined, the machine will also need an available open port on which to establish inbound TCP connections (e.g. 27199). + + In general, the more CPU cores and memory the machine has, the more jobs that can be scheduled to run on that machine at once. See :ref:`ug_job_concurrency` for more information on capacity. + +- The system that is going to run the install bundle to setup the remote node requires the collection ``ansible.receptor`` to be installed: - If machine has access to the internet: @@ -25,34 +45,14 @@ Prerequisites Installing the receptor collection dependency from the ``requirements.yml`` file will consistently retrieve the receptor version specified there, as well as any other collection dependencies that may be needed in the future. - - If machine does not have access to the internet, refer to `Downloading a collection from Automation Hub `_ to configure `Automation Hub `_ in Ansible Galaxy locally. - - -- If you are using the default |ee| (provided with AWX) to run on remote execution nodes, you must add a pull secret in AWX that contains the credential for pulling the |ee| image. To do this, create a pull secret on the AWX namespace and configure the ``ee_pull_credentials_secret`` parameter in the Operator: - - 1. Create a secret: - :: - - oc create secret generic ee-pull-secret \ - --from-literal=username= \ - --from-literal=password= \ - --from-literal=url=registry.redhat.io - - :: - - oc edit awx - - 2. Add ``ee_pull_credentials_secret ee-pull-secret`` to the spec: - :: - - spec.ee_pull_credentials_secret=ee-pull-secret + - If machine does not have access to the internet, refer to `Downloading a collection for offline use `_. - To manage instances from the AWX user interface, you must have System Administrator or System Auditor permissions. Manage instances -~~~~~~~~~~~~~~~~~~ +----------------- Click **Instances** from the left side navigation menu to access the Instances list. @@ -75,7 +75,7 @@ The Instances list displays all the current nodes in your topology, along with r - **Provisioning Failure**: a node that failed during provisioning (currently not yet supported and is subject to change in a future release) - **De-provisioning Failure**: a node that failed during deprovisioning (currently not yet supported and is subject to change in a future release) -- **Node Type** specifies whether the node is a control, hybrid, hop, or execution node. See :term:`node` for further detail. +- **Node Type** specifies whether the node is a control, hop, execution node, or hybrid (not applicable to operator-based installations). See :term:`node` for further detail. - **Capacity Adjustment** allows you to adjust the number of forks in your nodes - **Used Capacity** indicates how much capacity has been used - **Actions** allow you to enable or disable the instance to control whether jobs can be assigned to it @@ -87,7 +87,7 @@ From this page, you can add, remove or run health checks on your nodes. Use the .. note:: - You can still remove an instance even if it is active and jobs are running on it. AWXwill attempt to wait for any jobs running on this node to complete before actually removing it. + You can still remove an instance even if it is active and jobs are running on it. AWX will attempt to wait for any jobs running on this node to complete before actually removing it. Click **Remove** to confirm. @@ -114,11 +114,20 @@ The example health check shows the status updates with an error on node 'one': Add an instance -~~~~~~~~~~~~~~~~ - -One of the ways to expand capacity is to create an instance, which serves as a node in your topology. +---------------- -1. Click **Instances** from the left side navigation menu. +One of the ways to expand capacity is to create an instance. Standalone execution nodes can be added to run alongside the Kubernetes deployment of AWX. These machines will not be a part of the AWX Kubernetes cluster. The control nodes running in the cluster will connect and submit work to these machines via Receptor. The machines are registered in AWX as type "execution" instances, meaning they will only be used to run AWX jobs, not dispatch work or handle web requests as control nodes do. + +Hop nodes can be added to sit between the control plane of AWX and standalone execution nodes. These machines will not be a part of the AWX Kubernetes cluster and they will be registered in AWX as node type "hop", meaning they will only handle inbound and outbound traffic for otherwise unreachable nodes in a different or more strict network. + +Below is an example of an AWX task pod with two execution nodes. Traffic to execution node 2 flows through a hop node that is setup between it and the control plane. + +.. image:: ../common/images/instances_awx_task_pods_hopnode.png + :alt: AWX task pod with a hop node between the control plane of AWX and standalone execution nodes. + +To create an instance in AWV: + +1. Click **Instances** from the left side navigation menu of the AWX UI. 2. In the Instances list view, click the **Add** button and the Create new Instance window opens. @@ -130,13 +139,30 @@ An instance has several attributes that may be configured: - Enter a fully qualified domain name (ping-able DNS) or IP address for your instance in the **Host Name** field (required). This field is equivalent to ``hostname`` in the API. - Optionally enter a **Description** for the instance - The **Instance State** field is auto-populated, indicating that it is being installed, and cannot be modified -- The **Listener Port** is pre-populated with the most optimal port, however you can change the port to one that is more appropriate for your configuration. This field is equivalent to ``listener_port`` in the API. -- The **Instance Type** field is auto-populated and cannot be modified. Only execution nodes can be created at this time. -- Check the **Enable Instance** box to make it available for jobs to run on it +- Optionally specify the **Listener Port** for the receptor to listen on for incoming connections. This is an open port on the remote machine used to establish inbound TCP connections. This field is equivalent to ``listener_port`` in the API. +- Select from the options in **Instance Type** field to specify the type you want to create. Only execution and hop nodes can be created as operator-based installations do not support hybrid nodes. This field is equivalent to ``node_type`` in the API. +- In the **Peers** field, select the instance hostnames you want your new instance to connect outbound to. +- In the **Options** fields: + - Check the **Enable Instance** box to make it available for jobs to run on an execution node. + - Check the **Managed by Policy** box to allow policy to dictate how the instance is assigned. + - Check the **Peers from control nodes** box to allow control nodes to peer to this instance automatically. Listener port needs to be set if this is enabled or the instance is a peer. + +In the example diagram above, the configurations are as follows: + ++------------------+---------------+--------------------------+--------------+ +| instance name | listener_port | peers_from_control_nodes | peers | ++==================+===============+==========================+==============+ +| execution node 1 | 27199 | true | [] | ++------------------+---------------+--------------------------+--------------+ +| hop node | 27199 | true | [] | ++------------------+---------------+--------------------------+--------------+ +| execution node 2 | null | false | ["hop node"] | ++------------------+---------------+--------------------------+--------------+ + 3. Once the attributes are configured, click **Save** to proceed. -Upon successful creation, the Details of the created instance opens. +Upon successful creation, the Details of the one of the created instances opens. .. image:: ../common/images/instances_create_details.png :alt: Details of the newly created instance. @@ -145,12 +171,16 @@ Upon successful creation, the Details of the created instance opens. The proceeding steps 4-8 are intended to be ran from any computer that has SSH access to the newly created instance. -4. Click the download button next to the **Install Bundle** field to download the tarball that includes this new instance and the files relevant to install the node into the mesh. +4. Click the download button next to the **Install Bundle** field to download the tarball that contain files to allow AWX to make proper TCP connections to the remote machine. .. image:: ../common/images/instances_install_bundle.png :alt: Instance details showing the Download button in the Install Bundle field of the Details tab. -5. Extract the downloaded ``tar.gz`` file from the location you downloaded it. The install bundle contains yaml files, certificates, and keys that will be used in the installation process. +5. Extract the downloaded ``tar.gz`` file from the location you downloaded it. The install bundle contains TLS certificates and keys, a certificate authority, and a proper Receptor configuration file. To facilitate that these files will be in the right location on the remote machine, the install bundle includes an ``install_receptor.yml`` playbook. The playbook requires the Receptor collection which can be obtained via: + +:: + + ansible-galaxy collection install -r requirements.yml 6. Before running the ``ansible-playbook`` command, edit the following fields in the ``inventory.yml`` file: @@ -177,17 +207,89 @@ The content of the ``inventory.yml`` file serves as a template and contains vari ansible-playbook -i inventory.yml install_receptor.yml +Wait a few minutes for the periodic AWX task to do a health check against the new instance. You may run a health check by selecting the node and clicking the **Run health check** button from its Details page at any time. Once the instances endpoint or page reports a "Ready" status for the instance, jobs are now ready to run on this machine! -9. To view other instances within the same topology, click the **Peers** tab associated with the control node. - -.. note:: - - You will only be able to view peers of the control plane nodes at this time, which are the execution nodes. Since you are limited to creating execution nodes in this release, you will be unable to create or view peers of execution nodes. - +9. To view other instances within the same topology or associate peers, click the **Peers** tab. .. image:: ../common/images/instances_peers_tab.png :alt: "Peers" tab showing two peers. -You may run a health check by selecting the node and clicking the **Run health check** button from its Details page. +To associate peers with your node, click the **Associate** button to open a dialog box of instances eligible for peering. + +.. image:: ../common/images/instances_associate_peer.png + :alt: Instances available to peer with the example hop node. + +Execution nodes can peer with either hop nodes or other execution nodes. Hop nodes can only peer with execution nodes unless you check the **Peers from control nodes** check box from the **Options** field. + +.. note:: + + If you associate or disassociate a peer, a notification will inform you to re-run the install bundle from the Peer Detail view (the :ref:`ag_topology_viewer` has the download link). + + .. image:: ../common/images/instances_associate_peer_reinstallmsg.png + :alt: Notification to re-run the installation bundle due to change in the peering. + +You can remove an instance by clicking **Remove** in the Instances page, or by setting the instance ``node_state = deprovisioning`` via the API. Upon deleting, a pop-up message will appear to notify that you may need to re-run the install bundle to make sure things that were removed are no longer connected. + 10. To view a graphical representation of your updated topology, refer to the :ref:`ag_topology_viewer` section of this guide. + + +Using a custom Receptor CA +--------------------------- + +The control nodes on the K8S cluster will communicate with execution nodes via mutual TLS TCP connections, running via Receptor. Execution nodes will verify incoming connections by ensuring the x509 certificate was issued by a trusted Certificate Authority (CA). + +You may choose to provide your own CA for this validation. If no CA is provided, AWX operator will automatically generate one using OpenSSL. + +Given custom ``ca.crt`` and ``ca.key`` stored locally, run the following: + +:: + + kubectl create secret tls awx-demo-receptor-ca \ + --cert=/path/to/ca.crt --key=/path/to/ca.key + +The secret should be named ``{AWX Custom Resource name}-receptor-ca``. In the above, the AWX Custom Resource name is "awx-demo". Replace "awx-demo" with your AWX Custom Resource name. + +If this secret is created after AWX is deployed, run the following to restart the deployment: + +:: + + kubectl rollout restart deployment awx-demo + + +.. note:: + + Changing the receptor CA will sever connections to any existing execution nodes. These nodes will enter an *Unavailable* state, and jobs will not be able to run on them. You will need to download and re-run the install bundle for each execution node. This will replace the TLS certificate files with those signed by the new CA. The execution nodes will then appear in a *Ready* state after a few minutes. + +Troubleshooting +---------------- + +If you encounter issues while setting up instances, refer to these troubleshooting tips. + +Fact cache not working +~~~~~~~~~~~~~~~~~~~~~~~ + +Make sure the system timezone on the execution node matches ``settings.TIME_ZONE`` (default is 'UTC') on AWX. Fact caching relies on comparing modified times of artifact files, and these modified times are not timezone-aware. Therefore, it is critical that the timezones of the execution nodes match AWX's timezone setting. + +To set the system timezone to UTC: + +:: + + ln -s /usr/share/zoneinfo/Etc/UTC /etc/localtime + + +Permission denied errors +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Jobs may fail with the following error, or similar: + +:: + + "msg":"exec container process `/usr/local/bin/entrypoint`: Permission denied" + + +For RHEL-based machines, this could be due to SELinux that is enabled on the system. You can pass these ``extra_settings`` container options to override SELinux protections: + +:: + + DEFAULT_CONTAINER_RUN_OPTIONS = ['--network', 'slirp4netns:enable_ipv6=true', '--security-opt', 'label=disable'] diff --git a/docs/docsite/rst/administration/topology_viewer.rst b/docs/docsite/rst/administration/topology_viewer.rst index d2782335cb..012669ba43 100644 --- a/docs/docsite/rst/administration/topology_viewer.rst +++ b/docs/docsite/rst/administration/topology_viewer.rst @@ -19,7 +19,7 @@ The Topology View opens and displays a graphic representation of how each recept .. image:: ../common/images/topology-viewer-initial-view.png -2. To adjust the zoom levels, or manipulate the graphic views, use the control buttons on the upper right-hand corner of the window. +2. To adjust the zoom levels, refresh the current view, or manipulate the graphic views, use the control buttons on the upper right-hand corner of the window. .. image:: ../common/images/topology-viewer-view-controls.png diff --git a/docs/docsite/rst/common/images/instances_associate_peer.png b/docs/docsite/rst/common/images/instances_associate_peer.png new file mode 100644 index 0000000000..397d7c2916 Binary files /dev/null and b/docs/docsite/rst/common/images/instances_associate_peer.png differ diff --git a/docs/docsite/rst/common/images/instances_associate_peer_reinstallmsg.png b/docs/docsite/rst/common/images/instances_associate_peer_reinstallmsg.png new file mode 100644 index 0000000000..8bb32223df Binary files /dev/null and b/docs/docsite/rst/common/images/instances_associate_peer_reinstallmsg.png differ diff --git a/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png b/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png new file mode 100644 index 0000000000..c9b65e64dc Binary files /dev/null and b/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png differ diff --git a/docs/docsite/rst/common/images/instances_create_details.png b/docs/docsite/rst/common/images/instances_create_details.png index 068f0a6dd2..0ceefefaf5 100644 Binary files a/docs/docsite/rst/common/images/instances_create_details.png and b/docs/docsite/rst/common/images/instances_create_details.png differ diff --git a/docs/docsite/rst/common/images/instances_create_new.png b/docs/docsite/rst/common/images/instances_create_new.png index 459c7b368f..5e69c6f7f5 100644 Binary files a/docs/docsite/rst/common/images/instances_create_new.png and b/docs/docsite/rst/common/images/instances_create_new.png differ diff --git a/docs/docsite/rst/common/images/instances_delete_prompt.png b/docs/docsite/rst/common/images/instances_delete_prompt.png index d1e446b05a..dd1268407b 100644 Binary files a/docs/docsite/rst/common/images/instances_delete_prompt.png and b/docs/docsite/rst/common/images/instances_delete_prompt.png differ diff --git a/docs/docsite/rst/common/images/instances_health_check.png b/docs/docsite/rst/common/images/instances_health_check.png index d191d08d83..0918b5f635 100644 Binary files a/docs/docsite/rst/common/images/instances_health_check.png and b/docs/docsite/rst/common/images/instances_health_check.png differ diff --git a/docs/docsite/rst/common/images/instances_health_check_pending.png b/docs/docsite/rst/common/images/instances_health_check_pending.png index 218d76a975..3a8cd7a29d 100644 Binary files a/docs/docsite/rst/common/images/instances_health_check_pending.png and b/docs/docsite/rst/common/images/instances_health_check_pending.png differ diff --git a/docs/docsite/rst/common/images/instances_install_bundle.png b/docs/docsite/rst/common/images/instances_install_bundle.png index f876c92e51..e45f79da85 100644 Binary files a/docs/docsite/rst/common/images/instances_install_bundle.png and b/docs/docsite/rst/common/images/instances_install_bundle.png differ diff --git a/docs/docsite/rst/common/images/instances_list_view.png b/docs/docsite/rst/common/images/instances_list_view.png index 212f703609..e66eb45726 100644 Binary files a/docs/docsite/rst/common/images/instances_list_view.png and b/docs/docsite/rst/common/images/instances_list_view.png differ diff --git a/docs/docsite/rst/common/images/instances_mesh_concept.png b/docs/docsite/rst/common/images/instances_mesh_concept.png new file mode 100644 index 0000000000..e87ab24504 Binary files /dev/null and b/docs/docsite/rst/common/images/instances_mesh_concept.png differ diff --git a/docs/docsite/rst/common/images/instances_mesh_concept_with_nodes.png b/docs/docsite/rst/common/images/instances_mesh_concept_with_nodes.png new file mode 100644 index 0000000000..3bb6d982cb Binary files /dev/null and b/docs/docsite/rst/common/images/instances_mesh_concept_with_nodes.png differ diff --git a/docs/docsite/rst/common/images/instances_peers_tab.png b/docs/docsite/rst/common/images/instances_peers_tab.png index 20008b34a1..c75c17de8d 100644 Binary files a/docs/docsite/rst/common/images/instances_peers_tab.png and b/docs/docsite/rst/common/images/instances_peers_tab.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-initial-view.png b/docs/docsite/rst/common/images/topology-viewer-initial-view.png index c7205a23b8..fdb758a735 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-initial-view.png and b/docs/docsite/rst/common/images/topology-viewer-initial-view.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-instance-details.png b/docs/docsite/rst/common/images/topology-viewer-instance-details.png index b5fa0d4af7..3fa83f58b5 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-instance-details.png and b/docs/docsite/rst/common/images/topology-viewer-instance-details.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-node-hover-click.png b/docs/docsite/rst/common/images/topology-viewer-node-hover-click.png index a10bf64fec..9371a5231b 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-node-hover-click.png and b/docs/docsite/rst/common/images/topology-viewer-node-hover-click.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-node-view.png b/docs/docsite/rst/common/images/topology-viewer-node-view.png index 26b1019ac5..71447e71c6 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-node-view.png and b/docs/docsite/rst/common/images/topology-viewer-node-view.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-view-controls.png b/docs/docsite/rst/common/images/topology-viewer-view-controls.png index 1995a4b76e..55b59b0e0f 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-view-controls.png and b/docs/docsite/rst/common/images/topology-viewer-view-controls.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-zoomed-view.png b/docs/docsite/rst/common/images/topology-viewer-zoomed-view.png index e6bfd8ae78..61eeecb460 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-zoomed-view.png and b/docs/docsite/rst/common/images/topology-viewer-zoomed-view.png differ diff --git a/docs/docsite/rst/userguide/glossary.rst b/docs/docsite/rst/userguide/glossary.rst index b7d5ae73f6..c0cf2749b0 100644 --- a/docs/docsite/rst/userguide/glossary.rst +++ b/docs/docsite/rst/userguide/glossary.rst @@ -90,17 +90,18 @@ Glossary Node A node corresponds to entries in the instance database model, or the ``/api/v2/instances/`` endpoint, and is a machine participating in the cluster / mesh. The unified jobs API reports ``awx_node`` and ``execution_node`` fields. The execution node is where the job runs, and AWX node interfaces between the job and server functions. - +-----------+------------------------------------------------------------------------------------------------------+ - | Node Type | Description | - +===========+======================================================================================================+ - | Control | Nodes that run persistent |aap| services, and delegate jobs to hybrid and execution nodes | - +-----------+------------------------------------------------------------------------------------------------------+ - | Hybrid | Nodes that run persistent |aap| services and execute jobs | - +-----------+------------------------------------------------------------------------------------------------------+ - | Hop | Used for relaying across the mesh only | - +-----------+------------------------------------------------------------------------------------------------------+ - | Execution | Nodes that run jobs delivered from control nodes (jobs submitted from the user's Ansible automation) | - +-----------+------------------------------------------------------------------------------------------------------+ + +-----------+-----------------------------------------------------------------------------------------------------------------+ + | Node Type | Description | + +-----------+-----------------------------------------------------------------------------------------------------------------+ + | Control | Nodes that run persistent Ansible Automation Platform services, and delegate jobs to hybrid and execution nodes | + +-----------+-----------------------------------------------------------------------------------------------------------------+ + | Hybrid | Nodes that run persistent Ansible Automation Platform services and execute jobs | + | | (not applicable to operator-based installations) | + +-----------+-----------------------------------------------------------------------------------------------------------------+ + | Hop | Used for relaying across the mesh only | + +-----------+-----------------------------------------------------------------------------------------------------------------+ + | Execution | Nodes that run jobs delivered from control nodes (jobs submitted from the user’s Ansible automation) | + +-----------+-----------------------------------------------------------------------------------------------------------------+ Notification Template An instance of a notification type (Email, Slack, Webhook, etc.) with a name, description, and a defined configuration.