External-Mirrors/awx

mirror of https://github.com/ansible/awx.git synced 2026-02-15 10:10:01 -03:30

Author	SHA1	Message	Date
Gabriel Muniz	e15f4de0dd	Fix race with heartbeat and reaper logic (#13713 ) * Fix race with heartbeat and reaper logic * Fix tests to fail when over drift over heartbeat time * replaced modified with started time for reap() code and added test * fixed logic bug and cleaned up tests * Added comments to tests to call out reasoning	2023-03-17 14:24:31 -04:00
Alan Rominger	a64467c5a6	Shortcut Instance.objects.me when possible	2022-10-05 09:11:42 -04:00
Alan Rominger	ccd46a1c0f	Move reaper logic into worker, avoiding bottlenecks	2022-08-17 15:42:47 -04:00
Shane McDonald	16be38bb54	Allow for passing custom job_explanation to reaper methods Co-authored-by: Alan Rominger <arominge@redhat.com>	2022-08-17 11:41:49 -04:00
Shane McDonald	3c51cb130f	Add grace period settings for task manager timeout, and pod / job waiting reapers Co-authored-by: Alan Rominger <arominge@redhat.com>	2022-08-17 11:39:01 -04:00
Alan Rominger	278db2cdde	Split reaper for running and waiting jobs Avoid running jobs that have already been reapted Co-authored-by: Elijah DeLee <kdelee@redhat.com> Remove unnecessary extra actions Fix waiting jobs in other cases of reaping	2022-08-17 10:53:29 -04:00
Alan Rominger	585d3f4e2a	Register system again if deleted by another pod Avoid cases where missing instance would throw error on startup this gives time for heartbeat to register it	2022-08-08 22:36:17 -04:00
Alan Rominger	fd671ecc9d	Give specific messages if job was killed due to SIGTERM or SIGKILL (#12435 ) * Reap jobs on dispatcher startup to increase clarity, replace existing reaping logic * Exit jobs if receiving SIGTERM signal * Fix unwanted reaping on shutdown, let subprocess close out * Add some sanity tests for signal module * Add a log for an unhandled dispatcher error * Refine wording of error messages Co-authored-by: Elijah DeLee <kdelee@redhat.com>	2022-06-30 13:20:08 -04:00
Alan Rominger	fe5736dc7f	Specifically abort the reaper if instance not registered	2022-03-29 14:08:58 -04:00
Bill Nottingham	c8cf28f266	Assorted renaming and string changes	2021-04-30 14:32:05 -04:00
Ryan Petrello	c2ef0a6500	move code linting to a stricter pep8-esque auto-formatting tool, black	2021-03-23 09:39:58 -04:00
Buymov Ivan	f2676064fd	Fix error with rejoining node to cluster after lost connection to postgres	2019-09-27 01:17:27 -04:00
Ryan Petrello	38bf174bda	don't reap jobs that aren't running this is a simple sanity check, but it should help us avoid shooting ourselves in the foot in complicated scenarios, such as: 1. A dispatcher worker is running a job, and it's killed with `kill -9` 2. The dispatcher attempts to reap jobs with a matching celery_task_id 3. The associated sync project update has the same celery_task_id (an implementation detail of how we implemented that), and it ends up getting reaped _even though_ it's already finished and has status=successful	2018-11-28 18:11:12 -05:00
Ryan Petrello	0d29bbfdc6	make the dispatcher more fault-tolerant to prolonged database outages	2018-10-18 20:00:07 -04:00
Ryan Petrello	ff1e8cc356	replace celery task decorators with a kombu-based publisher this commit implements the bulk of `awx-manage run_dispatcher`, a new command that binds to RabbitMQ via kombu and balances messages across a pool of workers that are similar to celeryd workers in spirit. Specifically, this includes: - a new decorator, `awx.main.dispatch.task`, which can be used to decorate functions or classes so that they can be designated as "Tasks" - support for fanout/broadcast tasks (at this point in time, only `conf.Setting` memcached flushes use this functionality) - support for job reaping - support for success/failure hooks for job runs (i.e., `handle_work_success` and `handle_work_error`) - support for auto scaling worker pool that scale processes up and down on demand - minimal support for RPC, such as status checks and pool recycle/reload	2018-10-11 10:53:30 -04:00

15 Commits