reduce parent->child lock contention

* We update the parent unified job template to point at new jobs
created. We also update a similar foreign key when the job finishes
running. This causes lock contention when the job template is
allow_simultaneous and there are a lot of jobs from that job template
running in parallel. I've seen as bad as 5 minutes waiting for the lock
when a job finishes.
* This change moves the parent->child update to OUTSIDE of the
transaction if the job is allow_simultaneous (inherited from the parent
unified job). We sacrafice a bit of correctness for performance. The
logic is, if you are launching 1,000 parallel jobs do you really care
that the job template contains a pointer to the last one you launched?
Probably not. If you do, you can always query jobs related to the job
template sorted by created time.
This commit is contained in:
Chris Meyers 2020-10-15 15:25:47 -04:00
parent 1d1e1787c4
commit 09a0448c3e

View File

@ -873,7 +873,13 @@ class UnifiedJob(PolymorphicModel, PasswordFieldsModel, CommonModelNameNotUnique
# If status changed, update the parent instance.
if self.status != status_before:
self._update_parent_instance()
# Update parent outside of the transaction for Job w/ allow_simultaneous=True
# This dodges lock contention at the expense of the foreign key not being
# completely correct.
if getattr(self, 'allow_simultaneous', False):
connection.on_commit(self._update_parent_instance)
else:
self._update_parent_instance()
# Done.
return result