Add max concurrent jobs and max forks per ig

The intention of this feature is primarily to provide some notion of max
capacity of container groups, but the logic I've left generic. Default
is 0, which will be interpereted as no maximum number of jobs or forks.

Includes refactor of variable and method names for clarity.
instances_by_hostname is an internal attribute of TaskManagerInstances.
Clarify when we are expecting the actual TaskManagerInstances object.

Unify how we process running tasks and consume capacity. This has the
effect that we do less expensive work in after_lock_init and have 1 less
loop over all the running tasks. Previously we looped for both building
the dependency graph as well as for calculating the starting capacity of
all the instances and instance groups. Now we acheive both tasks in the
same loop.

Because of how this changes the somewhat subtle "do-si-do" of how to
initialize the Task Manager models, introduce a wrapper class that tries
to take some of that burden off of other areas where we re-use this like
in the serializer and the metrics. Also use this wrapper class to handle
nicities of how to track capacity consumption on instances and instance
groups.

Add tests for max_forks and max_concurrent_jobs

Fixup tests that use TaskManagerModels to accomodate changes.

assign ig before call to consume capacity

if we don't do it in that order, then we don't correctly account for
the container group jobs we are starting in the middle of the task
manager run
This commit is contained in:
Elijah DeLee
2022-10-23 23:20:41 -04:00
parent 65c3db8cb8
commit 86856f242a
11 changed files with 490 additions and 185 deletions

View File

@@ -16,7 +16,7 @@ from awx.conf.license import get_license
from awx.main.utils import get_awx_version, camelcase_to_underscore, datetime_hook
from awx.main import models
from awx.main.analytics import register
from awx.main.scheduler.task_manager_models import TaskManagerInstances
from awx.main.scheduler.task_manager_models import TaskManagerModels
"""
This module is used to define metrics collected by awx.main.analytics.gather()
@@ -237,11 +237,10 @@ def projects_by_scm_type(since, **kwargs):
def instance_info(since, include_hostnames=False, **kwargs):
info = {}
# Use same method that the TaskManager does to compute consumed capacity without querying all running jobs for each Instance
active_tasks = models.UnifiedJob.objects.filter(status__in=['running', 'waiting']).only('task_impact', 'controller_node', 'execution_node')
tm_instances = TaskManagerInstances(
active_tasks, instance_fields=['uuid', 'version', 'capacity', 'cpu', 'memory', 'managed_by_policy', 'enabled', 'node_type']
)
for tm_instance in tm_instances.instances_by_hostname.values():
tm_models = TaskManagerModels.init_with_consumed_capacity(
instance_fields=['uuid', 'version', 'capacity', 'cpu', 'memory', 'managed_by_policy', 'enabled']
)
for tm_instance in tm_models.instances.instances_by_hostname.values():
instance = tm_instance.obj
instance_info = {
'uuid': instance.uuid,