mirror of
https://github.com/ansible/awx.git
synced 2026-01-10 15:32:07 -03:30
Update Tasks doc file with Receptor work unit information
This commit is contained in:
parent
b9131b9e8b
commit
2d87ccface
@ -72,14 +72,23 @@ Recommendations and constraints:
|
||||
|
||||
### Provisioning and Deprovisioning Instances and Groups
|
||||
|
||||
* **Provisioning** - Provisioning Instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contain all passwords and information used when installing the cluster, or other instances may be reconfigured (this can be done intentionally).
|
||||
* **Provisioning** - Provisioning instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contains all passwords and related information used when installing the cluster; if this is not the case, other instances may be reconfigured (this can be done intentionally).
|
||||
|
||||
* **Deprovisioning** - AWX does not automatically de-provision instances since it cannot distinguish between an instance that was taken offline intentionally or due to failure. Instead, the procedure for de-provisioning an instance is to shut it down (or stop the `automation-controller-service`) and run the AWX de-provision command:
|
||||
* **Deprovisioning** - Prior to version version 19.3.0, AWX did not automatically deprovision instances since it could not distinguish between an instance that was taken offline intentionally or due to failure. Instead, the procedure for deprovisioning an instance was to shut it down (or stop the `automation-controller-service`) and run the AWX deprovision command:
|
||||
|
||||
```
|
||||
$ awx-manage deprovision_instance --hostname=<hostname>
|
||||
```
|
||||
|
||||
Starting with AWX version 19.3.0, deprovisioning a node results in one or more Receptor configurations needing to be updated across one or more nodes, which therefore cannot be done via a manual process; the Automation Mesh Installer needs to deprovision the nodes.
|
||||
|
||||
Adding to and removing from the mesh does not require that every node is listed in the inventory file; in other words, the absence of a node from the inventory file _does not_ indicate that a node should be removed. Instead, a `hostvar` of `node_state: deprovision` conveys to the mesh installer that the node should be deprovisioned.
|
||||
|
||||
Once a node is identified as a candidate for deprovisioning, the following happens "behind the scenes":
|
||||
|
||||
- Receptor is disabled via `ssh`
|
||||
- The instance is deleted directly from database
|
||||
|
||||
* **Removing/Deprovisioning Instance Groups** - AWX does not automatically de-provision or remove instance groups, even though re-provisioning will often cause these to be unused. They may still show up in API endpoints and stats monitoring. These groups can be removed with the following command:
|
||||
|
||||
```
|
||||
|
||||
@ -157,10 +157,14 @@ One of the most important tasks in a clustered AWX installation is the periodic
|
||||
If a node in an AWX cluster discovers that one of its peers has not updated its heartbeat within a certain grace period, it is assumed to be offline, and its capacity is set to zero to avoid scheduling new tasks on that node. Additionally, jobs allegedly running or scheduled to run on that node are assumed to be lost, and "reaped", or marked as failed.
|
||||
|
||||
## Reaping Receptor Work Units
|
||||
When an AWX job is launched via receptor, files such as status, stdin, and stdout are created in a specific receptor directory. This directory on disk is a random 8 character string, e.g. qLL2JFNT
|
||||
This is also called the work Unit ID in receptor, and is used in various receptor commands, e.g. "work results qLL2JFNT"
|
||||
After an AWX job executes, the receptor work unit directory is cleaned up by issuing the work release command. In some cases the release process might fail, or if AWX crashes during a job's execution, the work release command is never issued to begin with.
|
||||
As such, there is a periodic task that will obtain a list of all receptor work units, and find which ones belong to AWX jobs that are in a completed state (status is canceled, error, or succeeded). This task will call "work release" on each of these work units to clean up the files on disk.
|
||||
|
||||
Each AWX job launch will start a "Receptor work unit". This work unit handles all of the `stdin`, `stdout`, and `status` of the job running on the mesh and will also write data to the disk.
|
||||
|
||||
Files such as `status`, `stdin`, and `stdout` are created in a specific Receptor directory which is named via a randomly-generated 8-character string (_e.g._ `qLL2JFNT`). This string is also the work unit ID in Receptor, and is utilized in various Receptor commands (_e.g._ `work results qLL2JFNT`).
|
||||
|
||||
The files that get written to disk via the work unit will get cleaned up after the AWX job finishes; the way that this is done is by issuing the `work release` command. In some cases, the release process might fail, or if AWX crashes during a job's execution, the `work release` command is never issued to begin with.
|
||||
|
||||
Because of this, there is a periodic task that will obtain a list of all Receptor work units and find which ones belong to AWX jobs that are in a completed state (where the status is either `canceled`, `error`, or `succeeded`). This task will call `work release` on each of these work units and clean up the files on disk.
|
||||
|
||||
## AWX Jobs
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user