mirror of
https://github.com/ansible/awx.git
synced 2026-03-04 10:11:05 -03:30
Improve transactional integrity for starting controller jobs in dispatcherd (#16300)
Remove SELECT FOR UPDATE from job dispatch to reduce transaction rollbacks
Move status transition from BaseTask.transition_status (which used
SELECT FOR UPDATE inside transaction.atomic()) into
dispatch_waiting_jobs. The new approach uses filter().update() which
is atomic at the database level without requiring explicit row locks,
reducing transaction contention and rollbacks observed in perfscale
testing.
The transition_status method was an artifact of the feature flag era
where we needed to support both old and new code paths. Since
dispatch_waiting_jobs is already a singleton
(on_duplicate='queue_one') scoped to the local node, the
de-duplication logic is unnecessary.
Status is updated after task submission to dispatcherd, so the job's
UUID is in the dispatch pipeline before being marked running —
preventing the reaper from incorrectly reaping jobs during the
handoff window. RunJob.run() handles the race where a worker picks
up the task before the status update lands by accepting waiting and
transitioning it to running itself.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -29,3 +29,30 @@ def test_cancel_flag_on_start(jt_linked, caplog):
|
||||
|
||||
job = Job.objects.get(id=job.id)
|
||||
assert job.status == 'canceled'
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_runjob_run_can_accept_waiting_status(jt_linked, mocker):
|
||||
"""Test that RunJob.run() can accept a job in 'waiting' status and transition it to 'running'
|
||||
before the pre_run_hook is called"""
|
||||
job = jt_linked.create_unified_job()
|
||||
job.status = 'waiting'
|
||||
job.save()
|
||||
|
||||
status_at_pre_run = None
|
||||
|
||||
def capture_status(instance, private_data_dir):
|
||||
nonlocal status_at_pre_run
|
||||
instance.refresh_from_db()
|
||||
status_at_pre_run = instance.status
|
||||
|
||||
mock_pre_run = mocker.patch.object(RunJob, 'pre_run_hook', side_effect=capture_status)
|
||||
|
||||
task = RunJob()
|
||||
try:
|
||||
task.run(job.id)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
mock_pre_run.assert_called_once()
|
||||
assert status_at_pre_run == 'running'
|
||||
|
||||
Reference in New Issue
Block a user