add setting for notification job status retry loop

* We trigger notifications when the callback receiver processes the
playbook_on_stats event. This is the last event in ansible-playbook and
the process should exist very shortly after this event is emitted. The
trouble comes in with the isolated node feature. There is a management
playbook that runs periodically that pulls the events from the remote
node. It's possible that the management playbooks runs, gets the
playbook_on_stats event, but does not see that the playbook is finished
running. Therefore the job status is still seen as 'running' BUT we have
kicked of the notification for the job. The notification worker will
enter a loop waiting on the job to enter the finished state. In this
case the time it takes for the job to enter the finished state can be
long, roughly 2 * the management playbook run time.
* This new setting allows the user to increase the time that the
notification spends waiting for the job to enter the finished state.
This commit is contained in:
Chris Meyers 2021-08-12 12:55:08 -04:00 committed by Alan Rominger
parent afbd9f04d7
commit 59bd73bff8
No known key found for this signature in database
GPG Key ID: C2D7EAAA12B63559
2 changed files with 4 additions and 1 deletions

View File

@ -846,7 +846,7 @@ def handle_work_error(task_id, *args, **kwargs):
def handle_success_and_failure_notifications(job_id):
uj = UnifiedJob.objects.get(pk=job_id)
retries = 0
while retries < 5:
while retries < settings.AWX_NOTIFICATION_JOB_FINISH_MAX_RETRY:
if uj.finished:
uj.send_notification_templates('succeeded' if uj.status == 'successful' else 'failed')
return

View File

@ -983,6 +983,9 @@ BROADCAST_WEBSOCKET_NEW_INSTANCE_POLL_RATE_SECONDS = 10
# How often websocket process will generate stats
BROADCAST_WEBSOCKET_STATS_POLL_RATE_SECONDS = 5
# Number of times to retry sending a notification when waiting on a job to finish.
AWX_NOTIFICATION_JOB_FINISH_MAX_RETRY = 5
DJANGO_GUID = {'GUID_HEADER_NAME': 'X-API-Request-Id'}
# Name of the default task queue