periodically run orphaned task cleanup as part of the scheduler

Running orphaned task cleanup within its own scheduled task via
celery-beat causes a race-y lock contention between the cleanup task and
the task scheduler.  Unfortunately, the scheduler and the cleanup task
both run at similar intervals, so this race condition is fairly easy to
hit.  At best, it results in situations where the scheduler is
regularly delayed 20s; depending on timing, this can cause situations
where task execution is needlessly delayed a minute+.  At worst, it can
result in situations where the scheduler is never able to schedule
tasks.

This change implements the cleanup as a periodic block of code in the
scheduler itself that tracks its "last run" time in memcached (by
default, it performs a cleanup every 60 seconds)

see: #6534
This commit is contained in:
Ryan Petrello
2017-07-10 14:03:07 -04:00
parent 35e28e9347
commit 0e29f3617d
4 changed files with 61 additions and 45 deletions

View File

@@ -432,9 +432,7 @@ CELERY_QUEUES = (
Queue('tower_scheduler', Exchange('scheduler', type='topic'), routing_key='tower_scheduler.job.#', durable=False),
Broadcast('tower_broadcast_all')
)
CELERY_ROUTES = {'awx.main.scheduler.tasks.run_fail_inconsistent_running_jobs': {'queue': 'tower',
'routing_key': 'tower'},
'awx.main.scheduler.tasks.run_task_manager': {'queue': 'tower',
CELERY_ROUTES = {'awx.main.scheduler.tasks.run_task_manager': {'queue': 'tower',
'routing_key': 'tower'},
'awx.main.scheduler.tasks.run_job_launch': {'queue': 'tower_scheduler',
'routing_key': 'tower_scheduler.job.launch'},
@@ -473,12 +471,8 @@ CELERYBEAT_SCHEDULE = {
'schedule': timedelta(seconds=20),
'options': {'expires': 20,}
},
'task_fail_inconsistent_running_jobs': {
'task': 'awx.main.scheduler.tasks.run_fail_inconsistent_running_jobs',
'schedule': timedelta(seconds=30),
'options': {'expires': 20,}
},
}
AWX_INCONSISTENT_TASK_INTERVAL = 60 * 3
# Django Caching Configuration
if is_testing():