Major fixes for job callback receiver processing

* Add logic to ansible callback plugin to prevent it from waiting
  forever to submit events to Tower
* Lower process recycle threshold for tower callback receiver
* Make recycle threshold configurable
* Properly exit the main callback receiver management process if
  the event receiver process is dead so we don't leave dead worker
  processes
* Set a configurable maximum number of messages that can be waiting
  in a worker process queue before it is skipped instead of filling
  up memory on a dead worker process
* Skip over a dead worker process if it's queue is full
* Force restart callback receiver if all queues are dead
* Roll back transaction.atomic with the thought that it is causing
  deadlocks in the worker process.  Use the old commit_on_success
  mechanism with retry logic
* Seperate queue nonblocking expected exception from any other type
  of exception that could be encountered on the queue fetch operation
This commit is contained in:
Matthew Jones
2015-03-13 11:11:49 -04:00
parent 6258035ca8
commit 0f5beca9ae
3 changed files with 65 additions and 47 deletions

View File

@@ -317,6 +317,14 @@ ANSIBLE_FORCE_COLOR = True
# the celery task.
AWX_TASK_ENV = {}
# Maximum number of job events processed by the callback receiver worker process
# before it recycles
JOB_EVENT_RECYCLE_THRESHOLD = 3000
# Maximum number of job events that can be waiting on a single worker queue before
# it can be skipped as too busy
JOB_EVENT_MAX_QUEUE_SIZE = 100
# Flag to enable/disable updating hosts M2M when saving job events.
CAPTURE_JOB_EVENT_HOSTS = False