awx/docs/capacity.md
Elijah DeLee 604cbc1737
Consume control capacity (#11665)
* Select control node before start task

Consume capacity on control nodes for controlling tasks and consider
remainging capacity on control nodes before selecting them.

This depends on the requirement that control and hybrid nodes should all
be in the instance group named 'controlplane'. Many tests do not satisfy that
requirement. I'll update the tests in another commit.

* update tests to use controlplane

We don't start any tasks if we don't have a controlplane instance group

Due to updates to fixtures, update tests to set node type and capacity
explicitly so they get expected result.

* Fixes for accounting of control capacity consumed

Update method is used to account for currently consumed capacity for
instance groups in the in-memory capacity tracking data structure we initialize in
after_lock_init and then update via calculate_capacity_consumed (both in
task_manager.py)

Also update fit_task_to_instance to consider control impact on instances

Trust that these functions do the right thing looking for a
node with capacity, and cut out redundant check for the whole group's
capacity per Alan's reccomendation.

* Refactor now redundant code

Deal with control type tasks before we loop over the preferred instance
groups, which cuts out the need for some redundant logic.

Also, fix a bug where I was missing assigning the execution node in one case!

* set job explanation on tasks that need capacity

move the job explanation for jobs that need capacity to a function
so we can re-use it in the three places we need it.

* project updates always run on the controlplane

Instance group ordering makes no sense on project updates because they
always need to run on the control plane.

Also, since hybrid nodes should always run the control processes for the
jobs running on them as execution nodes, account for this when looking for a
execution node.

* fix misleading message

the variables and wording were both misleading, fix to be more accurate
description in the two different cases where this log may be emitted.

* use settings correctly

use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded
name
cache the controlplane_ig object during the after lock init to avoid
an uneccesary query
eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use
only AWX_CONTROL_NODE_TASK_IMPACT

* add test for control capacity consumption

add test to verify that when there are 2 jobs and only capacity for one
that one will move into waiting and the other stays in pending

* add test for hybrid node capacity consumption

assert that the hybrid node is used for both control and execution and
capacity is deducted correctly

* add test for task.capacity_type = control

Test that control type tasks have the right capacity consumed and
get assigned to the right instance group

Also fix lint in the tests

* jobs_running not accurate for control nodes

We can either NOT use "idle instances" for control nodes, or we need
to update the jobs_running property on the Instance model to count
jobs where the node is the controller_node.

I didn't do that because it may be an expensive query, and it would be
hard to make it match with jobs_running on the InstanceGroup which
filters on tasks assigned to the instance group.

This change chooses to stop considering "idle" control nodes an option,
since we can't acurrately identify them.

The way things are without any change, is we are continuing to over consume capacity on control nodes
because this method sees all control nodes as "idle" at the beginning
of the task manager run, and then only counts jobs started in that run
in the in-memory tracking. So jobs which last over a number of task
manager runs build up consuming capacity, which is accurately reported
via Instance.consumed_capacity

* Reduce default task impact for control nodes

This is something we can experiment with as far as what users
want at install time, but start with just 1 for now.

* update capacity docs

Describe usage of the new setting and the concept of control impact.

Co-authored-by: Alan Rominger <arominge@redhat.com>
Co-authored-by: Rebeccah <rhunter@redhat.com>
2022-02-14 10:13:22 -05:00

7.1 KiB

AWX Capacity Determination and Job Impact

The AWX capacity system determines how many jobs can run on an Instance given the amount of resources available to the Instance and the size of the jobs that are running (referred to hereafter as Impact). The algorithm used to determine this is based entirely on two things:

  • How much memory is available to the system (mem_capacity)
  • How much CPU is available to the system (cpu_capacity)

Capacity also impacts Instance Groups. Since Groups are composed of Instances, likewise Instances can be assigned to multiple Groups. This means that impact to one Instance can potentially affect the overall capacity of other Groups.

Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see Tower Clustering/HA Overview). When the Task Manager is preparing its graph to determine which Group a Job will run on, it will commit the capacity of an Instance Group to a Job that hasn't or isn't ready to start yet (see Task Manager Overview).

Finally, if only one Instance is available (especially in smaller configurations) for a Job to run, the Task Manager will allow that Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that jobs themselves won't get clogged as a result of an under-provisioned system.

These concepts mean that, in general, Capacity and Impact is not a zero-sum system relative to Jobs and Instances/Instance Groups.

Resource Determination For Capacity Algorithm

The capacity algorithms are defined in order to determine how many forks a system is capable of running at the same time. This controls how many systems Ansible itself will communicate with simultaneously. Increasing the number of forks a AWX system is running will, in general, allow jobs to run faster by performing more work in parallel. The tradeoff is that this will increase the load on the system which could cause work to slow down overall.

AWX can operate in two modes when determining capacity. mem_capacity (the default) will allow you to overcommit CPU resources while protecting the system from running out of memory. If most of your work is not CPU-bound, then selecting this mode will maximize the number of forks.

Memory Relative Capacity

mem_capacity is calculated relative to the amount of memory needed per-fork. Taking into account the overhead for AWX's internal components, this comes out to be about 100MB per fork. When considering the amount of memory available to Ansible jobs the capacity algorithm will reserve 2GB of memory to account for the presence of other AWX services. The algorithm itself looks like this:

(mem - 2048) / mem_per_fork

As an example:

(4096 - 2048) / 100 == ~20

So a system with 4GB of memory would be capable of running 20 forks. The value mem_per_fork can be controlled by setting the AWX settings value (or environment variable) SYSTEM_TASK_FORKS_MEM which defaults to 100.

CPU-Relative Capacity

Often times Ansible workloads can be fairly CPU-bound. In these cases, sometimes reducing the simultaneous workload allows more tasks to run faster and reduces the average time-to-completion of those jobs.

Just as the AWX mem_capacity algorithm uses the amount of memory needed per-fork, the cpu_capacity algorithm looks at the amount of CPU resources is needed per fork. The baseline value for this is 4 forks per core. The algorithm itself looks like this:

cpus * fork_per_cpu

For example, in a 4-core system:

4 * 4 == 16

The value fork_per_cpu can be controlled by setting the AWX settings value (or environment variable) SYSTEM_TASK_FORKS_CPU, which defaults to 4.

Job Impacts Relative To Capacity

When selecting the capacity, it's important to understand how each job type affects it.

It's helpful to understand what forks mean to Ansible: http://docs.ansible.com/ansible/latest/intro_configuration.html#forks

The default forks value for ansible is 5. However, if AWX knows that you're running against fewer systems than that, then the actual concurrency value will be lower.

When a job is made to run, AWX will add 1 to the number of forks selected to compensate for the Ansible parent process. So if you are running a playbook against 5 systems with a forks value of 5, then the actual forks value from the perspective of Job Impact will be 6.

Impact of Job Types in AWX

Jobs have two types of impact. Task "execution" impact and task "control" impact.

For instances that are the "controller_node" for a task, the impact is set by settings.AWX_CONTROL_NODE_TASK_IMPACT and it is the same no matter what type of job.

For instances that are the "execution_node" for a task, the impact is calculated as following:

Jobs and Ad-hoc jobs follow the above model, forks + 1.

Other job types have a fixed execution impact:

  • Inventory Updates: 1
  • Project Updates: 1
  • System Jobs: 5

For jobs that execute on the same node as they are controlled by, both settings.AWX_CONTROL_NODE_TASK_IMPACT and the job task execution impact apply.

Examples: Given settings.AWX_CONTROL_NODE_TASK_IMPACT is 1:

  • Project updates (where the execution_node is always the same as the controller_node), have a total impact of 2.
  • Container group jobs (where the execution node is not a member of the cluster) only control impact applies, and the controller node has a total task impact of 1.

Selecting the Right settings.AWX_CONTROL_NODE_TASK_IMPACT

This setting allows you to determine how much impact controlling jobs has. This can be helpful if you notice symptoms of your control plane exceeding desired CPU or memory usage, as it effectivly throttles how many jobs can be run concurrently by your control plane. This is usually a concern with container groups, which at this time effectively have infinite capacity, so it is easy to end up with too many jobs running concurrently, overwhelming the control plane pods with events and control processes.

If you want more throttling behavior, increase the setting. If you want less throttling behavior, lower the setting.

Selecting the Right Capacity

Selecting between a memory-focused capacity algorithm and a CPU-focused capacity for your AWX use means you'll be selecting between a minimum and maximum value. In the above examples, the CPU capacity would allow a maximum of 16 forks while the Memory capacity would allow 20. For some systems, the disparity between these can be large and oftentimes you may want to have a balance between these two.

An Instance field, capacity_adjustment, allows you to select how much of one or the other you want to consider. It is represented as a value between 0.0 and 1.0. If set to a value of 1.0, then the largest value will be used. In the above example, that would be Memory capacity, so a value of 20 forks would be selected. If set to a value of 0.0 then the smallest value will be used. A value of 0.5 would be a 50/50 balance between the two algorithms which would be 18:

16 + (20 - 16) * 0.5 == 18