External-Mirrors/awx

mirror of https://github.com/ansible/awx.git synced 2026-02-03 18:48:12 -03:30

Author	SHA1	Message	Date
Alan Rominger	998000bfbe	Surface correct error from bulk_create on unrecoverable error	2022-08-10 16:16:57 -04:00
Alan Rominger	43a50cc62c	Fix event counting in error handling path	2022-08-10 16:16:57 -04:00
Alan Rominger	30f556f845	Further resiliency changes focused on offline database Make logs from database outage more manageable Raise exception if update_model never recovers from problem	2022-08-10 16:16:57 -04:00
Rebeccah	5f9326b131	added average event processing metric (in seconds) that can be served to grafana via prometheus. This metric is a good indicator of how far behind the callback receiver is. The higher the load the further behind/the greater the number of seconds the metric will display. This number being high may indicate the need for horizontal scaling in the control plane or vertically scaling the number of callback receivers.	2022-06-06 15:14:56 -04:00
Alan Rominger	1e6ca01686	Fix the callback receiver --status command	2022-05-19 15:00:49 -04:00
Alan Rominger	29d60844a8	Fix notification timing issue by sending in the latter of 2 events (#12110 ) * Track host_status_counts and use that to process notifications * Remove now unused setting * Back out changes to callback class not needed after all * Skirt the need for duck typing by leaning on the cached field * Delete tests for deleted task * Revert "Back out changes to callback class not needed after all" This reverts commit 3b8ae350d218991d42bffd65ce4baac6f41926b2. * Directly hardcode stats_event_type for callback class * Fire notifications if stats event was never sent * Remove test content for deleted methods * Add placeholder for when no hosts matched * Make field default be None, denote events processed with empty dict * Make UI process null value for host_status_counts * Fix tracking of EOF dispatch for system jobs * Reorganize EVENT_MAP into class properties * Consolidate conditional I missed from EVENT_MAP refactor * Give up on the null condition, also applies for empty hosts * Remove cls position argument not being used * Move wrapup method out of class, add tests	2022-04-29 13:54:31 -04:00
Jeff Bradberry	1803c5bdb4	Fix up usage of django-guid It has replaced the class-based middleware, everything is function-based now.	2022-03-14 13:19:57 -04:00
Seth Foster	6db7cea148	variable name changes	2022-02-10 10:57:00 -05:00
Seth Foster	3993aa9524	Add metric for number of events emitted over websocket broadcast	2022-02-09 21:57:01 -05:00
Amol Gautam	a4a3ba65d7	Refactored tasks.py to a package --- Added 3 new sub-package : awx.main.tasks.system , awx.main.tasks.jobs , awx.main.tasks.receptor --- Modified the functional tests and unit tests accordingly	2022-01-14 11:55:41 -05:00
Alan Rominger	210d5084f0	Move skip flag up from event_data and pop it off	2021-06-08 13:33:54 -04:00
Alan Rominger	b551608f16	Move websocket skip logic into event_handler	2021-06-08 13:33:22 -04:00
Seth Foster	0c569c67fd	Add subsystem metrics - Adds a Metrics() class that can track data such as number of events the callback receiver inserted into database - Exposes this metric data at the api/v2/metrics/ endpoint. This data is prometheus-friendly - Metric data is stored in memory, then periodically saved to Redis. - Metric data is periodically broadcast to other nodes in the cluster, so that each node has a copy of the most recent metric data collected.	2021-03-25 15:23:52 -04:00
Ryan Petrello	c2ef0a6500	move code linting to a stricter pep8-esque auto-formatting tool, black	2021-03-23 09:39:58 -04:00
Ryan Petrello	3cc3cf1f80	add a per-request GUID and log as it travels through background services see: https://github.com/ansible/awx/issues/9329	2021-02-17 12:54:13 -05:00
Ryan Petrello	b744c4ebb7	further optimize callback receiver buffering for certain situations see: https://github.com/ansible/awx/issues/9085	2021-01-14 17:17:12 -05:00
Chris Meyers	eb47c8dbc6	centralize reusable profiling code	2020-10-27 08:21:41 -04:00
Ryan Petrello	baad765179	refactor some callback receiver code the bigint migration removed the foreign key constraints for: - host_id - job_id (and projectupdate_id, etc...) because of this, we don't really need to check explicitly for a host_id IntegrityError anymore (because it won't occur) additionally, while it's possible to insert an event with a mismatched job_id now (for example, you can totally start a long-running job, and delete the job record in the background using the ORM or psql), doing so results in DoesNotExist errors in the code that handles the playbook_on_stats events	2020-09-25 13:12:42 -04:00
Ryan Petrello	cd0b9de7b9	remove multiprocessing.Queue usage from the callback receiver instead, just have each worker connect directly to redis this has a few benefits: - it's simpler to explain and debug - back pressure on the queue keeps messages around in redis (which is observable, and survives the restart of Python processes) - it's likely notably more performant at high loads	2020-09-24 13:53:58 -04:00
Christian Adams	a899a147e1	Fix new flake8 from pyflakes 2.2.0 release	2020-04-20 09:50:50 -04:00
Ryan Petrello	d40a5dec8f	change when we send job notifications to avoid a race condition success/failure notifications for playbooks include summary data about the hosts in based on the contents of the playbook_on_stats event the current implementation suffers from a number of race conditions that sometimes can cause that data to be missing or incomplete; this change makes it so that for playbooks we build (and send) the notification in response to the playbook_on_stats event, not the EOF event	2020-03-19 10:01:52 -04:00
Ryan Petrello	78b00652bd	add the ability to enable profiling for the callback receiver workers	2020-01-27 12:03:53 -05:00
Bill Nottingham	4e46d5d7cd	Fix some lint	2020-01-20 17:15:27 -05:00
Ryan Petrello	8bd9233d2c	remove some unnecessary callback receiver debugging code	2020-01-14 14:21:53 -05:00
Ryan Petrello	306f504fb7	optimize the callback receiver to buffer writes on high throughput additionaly, optimize away several per-event host lookups and changed/failed propagation lookups we've always performed these (fairly expensive) queries on every event save - if you're processing tens of thousands of events in short bursts, this is way too slow this commit also introduces a new command for profiling the insertion rate of events, `awx-manage callback_stats` see: https://github.com/ansible/awx/issues/5514	2020-01-14 12:04:26 -05:00
Ryan Petrello	17a803f49c	remove the old callback plugin import paths and callback-specific tests	2019-04-12 16:11:23 -04:00
Ryan Petrello	0391dbc292	add additional DB retry logic to the callback receiver initially, I implemented this for _only_ the task worker, but it's probably needed for callback event workers, too	2018-11-29 11:57:46 -05:00
AlanCoding	482395eb6a	reduce default verbosity of devel-specific callback logging	2018-10-26 10:03:46 -04:00
Ryan Petrello	53ae05094e	use the proper logger for the callback receiver	2018-10-17 10:56:29 -04:00
Ryan Petrello	ff1e8cc356	replace celery task decorators with a kombu-based publisher this commit implements the bulk of `awx-manage run_dispatcher`, a new command that binds to RabbitMQ via kombu and balances messages across a pool of workers that are similar to celeryd workers in spirit. Specifically, this includes: - a new decorator, `awx.main.dispatch.task`, which can be used to decorate functions or classes so that they can be designated as "Tasks" - support for fanout/broadcast tasks (at this point in time, only `conf.Setting` memcached flushes use this functionality) - support for job reaping - support for success/failure hooks for job runs (i.e., `handle_work_success` and `handle_work_error`) - support for auto scaling worker pool that scale processes up and down on demand - minimal support for RPC, such as status checks and pool recycle/reload	2018-10-11 10:53:30 -04:00

30 Commits