External-Mirrors/awx

mirror of https://github.com/ansible/awx.git synced 2026-03-08 21:19:26 -02:30

Author	SHA1	Message	Date
Seth Foster	2f82b75748	Add subsystem metrics for task manager	2022-06-14 11:00:11 -04:00
Seth Foster	eba4a3f1c2	in case we fail a job in task manager, we need to add the project update to the inventoryupdate.source_project field	2022-05-12 15:21:17 -04:00
Seth Foster	0ae9fe3624	if dependency fails, fail job in task manager	2022-05-12 14:00:13 -04:00
Seth Foster	1b662fcca5	SCM inv source trigger project update - scm based inventory sources should launch project updates prior to running inventory updates for that source. - fixes scenario where a job is based on projectA, but the inventory source is based on projectB. Running the job will likely trigger a sync for projectA, but not projectB. comments	2022-05-12 14:00:12 -04:00
Alan Rominger	cb63d92bbf	Remove committed_capacity field, delete supporting code (#12086 ) * Remove committed_capacity field, delete supporting code * Track consumed capacity to solve the negatives problem * Use more verbose name for IG queryset	2022-04-22 13:41:32 -04:00
Elijah DeLee	689a216726	move static methods used by task manager (#12050 ) * move static methods used by task manager These static methods were being used to act on Instance-like objects that were SimpleNamespace objects with the necessary attributes. This change introduces dedicated classes to replace the SimpleNamespace objects and moves the formerlly staticmethods to a place where they are more relevant instead of tacked onto models to which they were only loosly related. Accept in-memory data structure in init methods for tests * initialize remaining capacity AFTER we built map of instances	2022-04-21 13:05:06 -04:00
Elijah DeLee	e24fc43a45	Revert "Only fetch fields we need in task manager" This reverts commit `868e811b3f`. Turns out this does not play well with polymorphic models. Will try again with .defer()	2022-04-14 11:55:33 -04:00
Elijah DeLee	868e811b3f	Only fetch fields we need in task manager By using .only we select fewer columns, avoiding potentially large fields that we never reference. Also, small tweak to eliminate what was a duplicate dictionary of hostname:instance, because we don't need build and carry two copies of the same data.	2022-04-13 17:24:33 -04:00
Elijah DeLee	2e9974133a	calculate remaining capacity in static method this is to avoid additional queries when we allready have all the active jobs fetched in the task manager	2022-04-13 11:56:07 -04:00
Elijah DeLee	4328b4cb67	drop call that queries all running and waiting jobs this is to fix one more place in the task manager where we end up querying all running and waiting jobs. Partial fix for https://github.com/ansible/awx/issues/11671	2022-04-12 10:31:47 -04:00
Jeff Bradberry	b852baaa39	Fix up logger .warn() calls to use .warning() instead This is a usage that was deprecated in Python 3.0.	2022-03-07 18:11:36 -05:00
Jeff Bradberry	a3a216f91f	Fix up new Django 3.0 deprecations Mostly text based: force/smart_text, ugettext_*	2022-03-07 18:11:36 -05:00
Elijah DeLee	38f50f014b	fix missing job lifecycle messages (#11801 ) we were missing these messages for control type jobs that call start_task earlier than other types of jobs	2022-02-23 13:56:25 -05:00
Elijah DeLee	921b2bfb28	drop unused logic in task manager There is no current need or use to keep a seperate dependency graph for each instance group. In the interest of making it clearer what the current code does, eliminate this superfluous complication. We are no longer ever referencing any accounting of instance group capacity, instead we only look at capacity on intances.	2022-02-14 16:15:03 -05:00
Elijah DeLee	604cbc1737	Consume control capacity (#11665 ) * Select control node before start task Consume capacity on control nodes for controlling tasks and consider remainging capacity on control nodes before selecting them. This depends on the requirement that control and hybrid nodes should all be in the instance group named 'controlplane'. Many tests do not satisfy that requirement. I'll update the tests in another commit. * update tests to use controlplane We don't start any tasks if we don't have a controlplane instance group Due to updates to fixtures, update tests to set node type and capacity explicitly so they get expected result. * Fixes for accounting of control capacity consumed Update method is used to account for currently consumed capacity for instance groups in the in-memory capacity tracking data structure we initialize in after_lock_init and then update via calculate_capacity_consumed (both in task_manager.py) Also update fit_task_to_instance to consider control impact on instances Trust that these functions do the right thing looking for a node with capacity, and cut out redundant check for the whole group's capacity per Alan's reccomendation. * Refactor now redundant code Deal with control type tasks before we loop over the preferred instance groups, which cuts out the need for some redundant logic. Also, fix a bug where I was missing assigning the execution node in one case! * set job explanation on tasks that need capacity move the job explanation for jobs that need capacity to a function so we can re-use it in the three places we need it. * project updates always run on the controlplane Instance group ordering makes no sense on project updates because they always need to run on the control plane. Also, since hybrid nodes should always run the control processes for the jobs running on them as execution nodes, account for this when looking for a execution node. * fix misleading message the variables and wording were both misleading, fix to be more accurate description in the two different cases where this log may be emitted. * use settings correctly use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded name cache the controlplane_ig object during the after lock init to avoid an uneccesary query eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use only AWX_CONTROL_NODE_TASK_IMPACT * add test for control capacity consumption add test to verify that when there are 2 jobs and only capacity for one that one will move into waiting and the other stays in pending * add test for hybrid node capacity consumption assert that the hybrid node is used for both control and execution and capacity is deducted correctly * add test for task.capacity_type = control Test that control type tasks have the right capacity consumed and get assigned to the right instance group Also fix lint in the tests * jobs_running not accurate for control nodes We can either NOT use "idle instances" for control nodes, or we need to update the jobs_running property on the Instance model to count jobs where the node is the controller_node. I didn't do that because it may be an expensive query, and it would be hard to make it match with jobs_running on the InstanceGroup which filters on tasks assigned to the instance group. This change chooses to stop considering "idle" control nodes an option, since we can't acurrately identify them. The way things are without any change, is we are continuing to over consume capacity on control nodes because this method sees all control nodes as "idle" at the beginning of the task manager run, and then only counts jobs started in that run in the in-memory tracking. So jobs which last over a number of task manager runs build up consuming capacity, which is accurately reported via Instance.consumed_capacity * Reduce default task impact for control nodes This is something we can experiment with as far as what users want at install time, but start with just 1 for now. * update capacity docs Describe usage of the new setting and the concept of control impact. Co-authored-by: Alan Rominger <arominge@redhat.com> Co-authored-by: Rebeccah <rhunter@redhat.com>	2022-02-14 10:13:22 -05:00
Amol Gautam	a4a3ba65d7	Refactored tasks.py to a package --- Added 3 new sub-package : awx.main.tasks.system , awx.main.tasks.jobs , awx.main.tasks.receptor --- Modified the functional tests and unit tests accordingly	2022-01-14 11:55:41 -05:00
Jeff Bradberry	f340f491dc	Control the visibility and use of hop node Instances - the list, detail, and health check API views should not include them - the Instance-InstanceGroup association views should not allow them to be changed - the ping view excludes them - list_instances management command excludes them - Instance.set_capacity_value sets hop nodes to 0 capacity - TaskManager will exclude them from the nodes available for job execution - TaskManager.reap_jobs_from_orphaned_instances will consider hop nodes to be an orphaned instance - The apply_cluster_membership_policies task will not manipulate hop nodes - get_broadcast_hosts will ignore hop nodes - active_count also will ignore hop nodes	2021-12-17 14:30:28 -05:00
Elijah DeLee	e10030b73d	Allow setting default execution group pod spec This will allow us to control the default container group created via settings, meaning we could set this in the operator and the default container group would get created with it applied. We need this for https://github.com/ansible/awx-operator/issues/242 Deepmerge the default podspec and the override With out this, providing the `spec` for the podspec would override everything contained, which ends up including the container used, which is not desired Also, use the same deepmerge function def, as the code seems to be copypasted from the utils	2021-12-10 15:02:45 -05:00
Alan Rominger	b721a4b361	Remove dev-only log filters and downgrade periodic logs	2021-12-07 14:35:02 -05:00
chris meyers	9f8250bd47	add events to job lifecycle * Note in the job lifecycle when the controller_node and execution_node are chosen. This event occurs most commonly in the task manager with a couple of exceptions that happen when we dynamically create dependenct jobs on the fly in tasks.py	2021-11-10 08:50:16 +08:00
Alan Rominger	62e9e7ea80	Avoid setting controller_node to an execution node for container jobs (#11117 )	2021-09-23 09:16:10 -04:00
Alan Rominger	daf4310176	Clean up work_type processing and fix execution vs control capacity (#10930 ) * Clean up added work_type processing for mesh_code branch * track both execution and control capacity * Remove unused execution_capacity property * Count all forms of capacity to make test pass * Force jobs to be on execution nodes, updates on control nodes * Introduce capacity_type property to abstract some details out * Update test to cover all job types at same time * Register OpenShift nodes as control types * Remove unqualified consumed_capacity from task manager and make unit tests work * Remove unqualified consumed_capacity from task manager and make unit tests work * Update unit test to execution vs control TM logic changes * Fix bug, else handling for work_type method	2021-08-26 07:24:14 -04:00
Alan Rominger	c3ad479fc6	Minor tweaks for the mesh_code branch from review (#10902 )	2021-08-24 08:41:35 -04:00
beeankha	1a9fcdccc2	Change place where controller node is being looked for in the task manager	2021-08-24 08:41:35 -04:00
Alan Rominger	f47eb126e2	Adopt the node_type field in receptor logic (#10802 ) * Adopt the node_type field in receptor logic * Refactor Instance.objects.register so we do not reset capacity to 0	2021-08-24 08:41:34 -04:00
Alan Rominger	13300bdbd4	Update rebase to keep old control plane capacity check Also do some basic work to separate control versus execution capacity this is to assure that we don't send jobs to the control node	2021-08-24 08:40:19 -04:00
Ryan Petrello	05cb876df5	implement an initial development environment for receptor-based clusters	2021-08-24 08:40:18 -04:00
Shane McDonald	ec8ac6f1a7	Introduce distinct controlplane instance group	2021-06-07 11:25:59 -04:00
Jim Ladd	84af610a1f	remove rebase cruft	2021-06-04 09:17:09 -07:00
Jim Ladd	7b188aafea	lint	2021-06-04 09:17:09 -07:00
Ryan Petrello	c7ab3ea86e	move the partition data migration to be a post-upgrade async process this copies the approach we took with the bigint migration	2021-06-04 09:17:07 -07:00
Jim Ladd	67046513ae	Push changes before rebasing	2021-06-04 09:17:07 -07:00
Jim Ladd	0eb1984b22	Only create partitions for regular jobs	2021-06-04 09:17:06 -07:00
Jim Ladd	c87d7b0d79	fix import	2021-06-04 09:17:06 -07:00
Jim Ladd	612e91263c	auto-create partition	2021-06-04 09:17:06 -07:00
Christian M. Adams	fe02c0b157	Fix error msg wording and sdb docs	2021-06-03 14:24:18 -04:00
Christian M. Adams	36f47f3696	The list secrets role rule is no longer not needed for container groups	2021-05-26 14:38:56 -04:00
Christian M. Adams	536c02dc55	Simplify hostname parsing	2021-05-25 15:19:40 -04:00
Christian M. Adams	d607dfd5d8	Added error handling for pull secret creation requests - Check (only) the existing secret to see if it's value would change.	2021-05-25 14:58:01 -04:00
Christian M. Adams	cea6d8c3cb	Use utf-8 & properly parse hostname from registry URL	2021-05-25 14:44:42 -04:00
Christian M. Adams	8316a1d198	Create pull secret in cluster and use it in PodSpec - base64 encode secret values before creating the secret - Construct valid .dockerconfigjson - Cancel jobs where it will obviously fail & error handling - Check if the secret exists first, then attempts to replace it if it does.	2021-05-25 14:44:42 -04:00
Yanis Guenane	562f78e53d	Rename awx to automation for pod names	2021-05-04 14:17:45 +02:00
Bill Nottingham	c8cf28f266	Assorted renaming and string changes	2021-04-30 14:32:05 -04:00
softwarefactory-project-zuul[bot]	6bea5dd294	Merge pull request #9957 from jbradberry/isolated-removal Isolated removal SUMMARY Removal of the isolated nodes feature. ISSUE TYPE Feature Pull Request COMPONENT NAME API AWX VERSION Reviewed-by: Alan Rominger <arominge@redhat.com> Reviewed-by: Jeff Bradberry <None> Reviewed-by: Elyézer Rezende <None> Reviewed-by: Bianca Henderson <beeankha@gmail.com>	2021-04-29 19:15:43 +00:00
Alan Rominger	67f7998ab9	Modify formatting in response to black update	2021-04-26 10:51:27 -04:00
Jeff Bradberry	1819a7963a	Make the necessary changes to the models - remove InstanceGroup.controller - remove Instance.last_isolated_check - remove .is_isolated and .is_controller methods/properties - remove .choose_online_controller_node() method - remove .supports_isolation() and replace with .can_run_containerized - simplify .can_run_containerized	2021-04-22 10:17:02 -04:00
Ryan Petrello	300f5a3a1f	use flake8 to lint for a few things black doesn't catch black does not warn about missing or extraneous imports, so let's bring back flake8 in our linting to check for them	2021-04-12 12:55:39 -04:00
Shane McDonald	2d48b24ef2	Update pod reaper to work with receptor launched pods	2021-04-05 17:45:15 -04:00
Ryan Petrello	c2ef0a6500	move code linting to a stricter pep8-esque auto-formatting tool, black	2021-03-23 09:39:58 -04:00
Ryan Petrello	f850f8d3e0	introduce a new global flag for denoating K8S-based deployments - In K8S-based installs, only container groups are intended to be used for playbook execution (JTs, adhoc, inventory updates), so in this scenario, other job types have a task impact of zero. - In K8S-based installs, traditional instances have zero capacity (because they're only members of the control plane where services - http/s, local control plane execution - run) - This commit also includes some changes that allow for the task manager to launch tasks with task_impact=0 on instances that have capacity=0 (previously, an instance with zero capacity would never be selected as the "execution node" This means that when IS_K8S=True, any Job Template associated with an Instance Group will never actually go from pending -> running (because there's no capacity - all playbooks must run through Container Groups). For an improved ux, our intention is to introduce logic into the operator install process such that the default group that's created at install time is a Container Group that's configured to point at the K8S cluster where awx itself is deployed.	2021-03-03 18:52:55 -05:00

1 2 3 4 5 ...

325 Commits