Major fixes for job callback receiver processing

* Add logic to ansible callback plugin to prevent it from waiting forever to submit events to Tower * Lower process recycle threshold for tower callback receiver * Make recycle threshold configurable * Properly exit the main callback receiver management process if the event receiver process is dead so we don't leave dead worker processes * Set a configurable maximum number of messages that can be waiting in a worker process queue before it is skipped instead of filling up memory on a dead worker process * Skip over a dead worker process if it's queue is full * Force restart callback receiver if all queues are dead * Roll back transaction.atomic with the thought that it is causing deadlocks in the worker process. Use the old commit_on_success mechanism with retry logic * Seperate queue nonblocking expected exception from any other type of exception that could be encountered on the queue fetch operation
2026-05-15 13:27:40 -02:30 · 2015-03-13 11:11:49 -04:00
parent 6258035ca8
commit 0f5beca9ae
3 changed files with 65 additions and 47 deletions
--- a/awx/settings/defaults.py
+++ b/awx/settings/defaults.py
@@ -317,6 +317,14 @@ ANSIBLE_FORCE_COLOR = True
 # the celery task.
 AWX_TASK_ENV = {}

+# Maximum number of job events processed by the callback receiver worker process
+# before it recycles
+JOB_EVENT_RECYCLE_THRESHOLD = 3000
+
+# Maximum number of job events that can be waiting on a single worker queue before
+# it can be skipped as too busy
+JOB_EVENT_MAX_QUEUE_SIZE = 100
+
 # Flag to enable/disable updating hosts M2M when saving job events.
 CAPTURE_JOB_EVENT_HOSTS = False