mirror of
https://github.com/ansible/awx.git
synced 2026-06-23 15:47:49 -02:30
AAP-68024 perf: derive last_job_host_summary from query instead of denormalized FK (#16332)
* perf: stop eagerly updating Host.last_job_host_summary on every job completion The playbook_on_stats wrapup path bulk-updates last_job_host_summary_id on every host touched by a job. In the Q4CY25 scale lab this query had a median execution time of 75 seconds due to index churn on main_host. Replace all reads of the denormalized FK with a new classmethod JobHostSummary.latest_for_host(host_id) that queries for the most recent summary on demand. This eliminates the write-side bulk_update of last_job_host_summary_id entirely. Changes: - Add JobHostSummary.latest_for_host() classmethod - Serializer: use latest_for_host() instead of obj.last_job_host_summary - Dashboard view: use subquery instead of FK traversal for failed hosts - Inventory.update_computed_fields: use subquery for failed host count - events.py: remove last_job_host_summary_id from bulk_update - signals.py: simplify _update_host_last_jhs to only update last_job - access.py/managers.py: remove select_related/defer through the FK The FK field on Host is left in place for now (removal requires a migration) but is no longer written to. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix .pk AttributeError, add job_template annotations, annotate host sublists - Add 'pk' to AnnotatedSummary dynamic type (fixes AttributeError in get_related) - Add job_template_id and job_template_name to subquery annotations so list views include these fields in summary_fields.last_job (matching detail views) - Traverse job__ FK from JobHostSummary instead of using separate UnifiedJob subquery with OuterRef on another annotation (cleaner SQL, avoids alias issue) - Annotate all host sublist views (InventoryHostsList, GroupHostsList, GroupAllHostsList, InventorySourceHostsList) to prevent N+1 queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update test_events to use JobHostSummary.latest_for_host instead of stale FKs Tests were asserting host.last_job_id and host.last_job_host_summary_id which are no longer updated. Use JobHostSummary.latest_for_host() to derive the same data, matching the new read-time derivation approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove stale failures_url from deprecated DashboardView The failures_url linked to ?last_job_host_summary__failed=True which filters on the now-stale FK. The dashboard count itself was already fixed to use a subquery annotation. Since DashboardView is deprecated and has_active_failures is a SerializerMethodField (not filterable), remove the failures_url entirely rather than creating a custom filter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply black formatting to changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor: replace 10 subquery annotations with bulk prefetch Instead of annotating every host queryset with 10 correlated subqueries (summary + job + job_template fields), annotate only _latest_summary_id and bulk-fetch the full JobHostSummary objects after pagination via select_related('job', 'job__job_template'). This reduces the SQL from 10 correlated subqueries to 1 subquery + 1 IN query, addressing review feedback about annotation overhead on host list views. - _annotate_host_latest_summary: only annotates _latest_summary_id - _prefetch_latest_summaries: bulk-fetches and attaches to host objects - HostSummaryPrefetchMixin: hooks into list() after pagination - Serializer uses real JobHostSummary objects (no more AnnotatedSummary) - to_representation always overwrites stale FK values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor: move latest summary to QuerySet._fetch_all + Host.latest_summary Per review feedback, replace the view-level HostSummaryPrefetchMixin with a custom QuerySet that bulk-attaches summaries at evaluation time (like prefetch_related), and a Host.latest_summary property as the single access point. - HostLatestSummaryQuerySet: overrides _fetch_all() to bulk-fetch JobHostSummary objects with select_related after queryset evaluation - HostManager now inherits from the custom queryset via from_queryset() - Host.latest_summary property: uses cache if available, falls back to individual query - Remove _annotate_host_latest_summary, _prefetch_latest_summaries, HostSummaryPrefetchMixin from views — no more list() override needed - Remove last_job/last_job_host_summary from SUMMARIZABLE_FK_FIELDS - Serializer uses obj.latest_summary and DEFAULT_SUMMARY_FIELDS loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: scope annotation to views, restore license_error/canceled_on - Remove with_latest_summary_id() from HostManager.get_queryset() to avoid applying the correlated subquery to every Host query globally (count, exists, internal relations) - Apply with_latest_summary_id() in get_queryset() of the 6 host-serving views only - Restore license_error and canceled_on to last_job summary fields to avoid breaking API change Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guard _fetch_all() to skip bulk-attach on non-annotated querysets Without this guard, _fetch_all() would set _latest_summary_cache=None on every host in non-annotated querysets (e.g. Host.objects.filter()), masking the per-object fallback query in Host.latest_summary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove name from last_job_host_summary and canceled_on from last_job summary Per reviewer feedback: these fields were not in the original API contract via SUMMARIZABLE_FK_FIELDS and their addition would be an API change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add functional tests for HostLatestSummaryQuerySet and Host.latest_summary Tests cover: - with_latest_summary_id() annotation and most-recent selection - _fetch_all() bulk-attach behavior on annotated querysets - _fetch_all() skips non-annotated querysets (preserves fallback) - .count() and .exists() do NOT trigger _fetch_all - Host.latest_summary cache hits (zero queries) and fallback - Host.latest_job property - select_related on bulk-attached summaries (no N+1) - Chaining preserves annotation - Multiple jobs / partial host coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply black formatting to test_host_queryset.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Thomasson <bthomass@redhat.com> * Fix flake8 F841: remove unused job1/job2 variables in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Thomasson <bthomass@redhat.com> * Add comment explaining why Prefetch was not used for host latest summary Django Prefetch cannot handle latest per group -- [:1] slicing fetches 1 record globally, not per host (Django ticket #26780). The custom _fetch_all override uses the same 2-query pattern as prefetch_related internally, customized for this use case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix null handling to keep old behavior --------- Signed-off-by: Ben Thomasson <bthomass@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: AlanCoding <arominge@redhat.com>
This commit is contained in:
@@ -71,8 +71,10 @@ class TestEvents:
|
||||
assert s.skipped == 0
|
||||
|
||||
for host in Host.objects.all():
|
||||
assert host.last_job_id == self.job.id
|
||||
assert host.last_job_host_summary.host == host
|
||||
latest_summary = JobHostSummary.latest_for_host(host.id)
|
||||
assert latest_summary is not None
|
||||
assert latest_summary.job_id == self.job.id
|
||||
assert latest_summary.host == host
|
||||
|
||||
def test_host_summary_generation_with_deleted_hosts(self):
|
||||
self._generate_hosts(10)
|
||||
@@ -91,8 +93,7 @@ class TestEvents:
|
||||
def test_host_summary_generation_with_limit(self):
|
||||
# Make an inventory with 10 hosts, run a playbook with a --limit
|
||||
# pointed at *one* host,
|
||||
# Verify that *only* that host has an associated JobHostSummary and that
|
||||
# *only* that host has an updated value for .last_job.
|
||||
# Verify that *only* that host has an associated JobHostSummary.
|
||||
self._generate_hosts(10)
|
||||
|
||||
# by making the playbook_on_stats *only* include Host 1, we're emulating
|
||||
@@ -105,13 +106,14 @@ class TestEvents:
|
||||
# be related to the appropriate Host)
|
||||
assert JobHostSummary.objects.count() == 1
|
||||
for h in Host.objects.all():
|
||||
latest_summary = JobHostSummary.latest_for_host(h.id)
|
||||
if h.name == 'Host 1':
|
||||
assert h.last_job_id == self.job.id
|
||||
assert h.last_job_host_summary_id == JobHostSummary.objects.first().id
|
||||
assert latest_summary is not None
|
||||
assert latest_summary.job_id == self.job.id
|
||||
assert latest_summary.id == JobHostSummary.objects.first().id
|
||||
else:
|
||||
# all other hosts in the inventory should remain untouched
|
||||
assert h.last_job_id is None
|
||||
assert h.last_job_host_summary_id is None
|
||||
# all other hosts in the inventory should have no summary
|
||||
assert latest_summary is None
|
||||
|
||||
def test_host_metrics_insert(self):
|
||||
self._generate_hosts(10)
|
||||
|
||||
213
awx/main/tests/functional/models/test_host_queryset.py
Normal file
213
awx/main/tests/functional/models/test_host_queryset.py
Normal file
@@ -0,0 +1,213 @@
|
||||
import pytest
|
||||
|
||||
from django.test.utils import CaptureQueriesContext
|
||||
from django.db import connection
|
||||
from django.utils.timezone import now
|
||||
|
||||
from awx.main.models import Job, JobEvent, Inventory, Host, JobHostSummary
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
class TestHostLatestSummaryQuerySet:
|
||||
"""Tests for HostLatestSummaryQuerySet and Host.latest_summary property."""
|
||||
|
||||
def _create_inventory_with_hosts(self, count=5):
|
||||
inventory = Inventory()
|
||||
inventory.save()
|
||||
Host.objects.bulk_create([Host(created=now(), modified=now(), name=f'host-{i}', inventory_id=inventory.id) for i in range(count)])
|
||||
return inventory
|
||||
|
||||
def _run_job(self, inventory, host_names=None):
|
||||
"""Run a fake job that creates JobHostSummary records for the given hosts."""
|
||||
if host_names is None:
|
||||
host_names = list(inventory.hosts.values_list('name', flat=True))
|
||||
job = Job(inventory=inventory)
|
||||
job.save()
|
||||
host_map = dict(inventory.hosts.values_list('name', 'id'))
|
||||
JobEvent.create_from_data(
|
||||
job_id=job.pk,
|
||||
parent_uuid='abc123',
|
||||
event='playbook_on_stats',
|
||||
event_data={
|
||||
'ok': {name: 1 for name in host_names},
|
||||
'changed': {},
|
||||
'dark': {},
|
||||
'failures': {},
|
||||
'ignored': {},
|
||||
'processed': {},
|
||||
'rescued': {},
|
||||
'skipped': {},
|
||||
},
|
||||
host_map=host_map,
|
||||
).save()
|
||||
return job
|
||||
|
||||
def test_with_latest_summary_id_annotates_hosts(self):
|
||||
inventory = self._create_inventory_with_hosts(3)
|
||||
job = self._run_job(inventory)
|
||||
|
||||
hosts = Host.objects.filter(inventory=inventory).with_latest_summary_id()
|
||||
for host in hosts:
|
||||
assert hasattr(host, '_latest_summary_id')
|
||||
summary = JobHostSummary.objects.filter(host=host, job=job).first()
|
||||
assert host._latest_summary_id == summary.id
|
||||
|
||||
def test_with_latest_summary_id_returns_most_recent(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
self._run_job(inventory)
|
||||
job2 = self._run_job(inventory)
|
||||
|
||||
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
|
||||
latest = JobHostSummary.objects.filter(host_id=host.id).order_by('-id').first()
|
||||
assert latest.job_id == job2.id
|
||||
assert host._latest_summary_id == latest.id
|
||||
|
||||
def test_with_latest_summary_id_none_for_no_summaries(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
# No job run — no summaries
|
||||
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
|
||||
assert host._latest_summary_id is None
|
||||
|
||||
def test_fetch_all_bulk_attaches_summaries(self):
|
||||
inventory = self._create_inventory_with_hosts(5)
|
||||
self._run_job(inventory)
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
|
||||
for host in hosts:
|
||||
assert hasattr(host, '_latest_summary_cache')
|
||||
assert host._latest_summary_cache is not None
|
||||
assert isinstance(host._latest_summary_cache, JobHostSummary)
|
||||
|
||||
def test_fetch_all_skips_non_annotated_querysets(self):
|
||||
"""Non-annotated querysets should NOT set _latest_summary_cache,
|
||||
preserving the per-object fallback in Host.latest_summary."""
|
||||
inventory = self._create_inventory_with_hosts(3)
|
||||
self._run_job(inventory)
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory))
|
||||
for host in hosts:
|
||||
assert not hasattr(host, '_latest_summary_cache')
|
||||
|
||||
def test_count_does_not_trigger_fetch_all(self):
|
||||
"""Calling .count() should not trigger _fetch_all or the bulk-attach logic."""
|
||||
inventory = self._create_inventory_with_hosts(5)
|
||||
self._run_job(inventory)
|
||||
|
||||
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
|
||||
with CaptureQueriesContext(connection) as ctx:
|
||||
result = qs.count()
|
||||
|
||||
assert result == 5
|
||||
# count() should produce a single COUNT query, not fetch all rows + summaries
|
||||
assert len(ctx.captured_queries) == 1
|
||||
assert 'COUNT' in ctx.captured_queries[0]['sql'].upper()
|
||||
|
||||
def test_exists_does_not_trigger_fetch_all(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
self._run_job(inventory)
|
||||
|
||||
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
|
||||
with CaptureQueriesContext(connection) as ctx:
|
||||
result = qs.exists()
|
||||
|
||||
assert result is True
|
||||
assert len(ctx.captured_queries) == 1
|
||||
|
||||
def test_latest_summary_property_uses_cache(self):
|
||||
"""When loaded via with_latest_summary_id(), Host.latest_summary
|
||||
should use the bulk-attached cache without extra queries."""
|
||||
inventory = self._create_inventory_with_hosts(3)
|
||||
self._run_job(inventory)
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
|
||||
|
||||
with CaptureQueriesContext(connection) as ctx:
|
||||
for host in hosts:
|
||||
summary = host.latest_summary
|
||||
assert summary is not None
|
||||
|
||||
# No additional queries — all data came from the bulk-attach
|
||||
assert len(ctx.captured_queries) == 0
|
||||
|
||||
def test_latest_summary_property_fallback(self):
|
||||
"""When loaded without annotation, Host.latest_summary should
|
||||
fall back to a per-object query."""
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
job = self._run_job(inventory)
|
||||
|
||||
host = Host.objects.filter(inventory=inventory).first()
|
||||
assert not hasattr(host, '_latest_summary_cache')
|
||||
|
||||
summary = host.latest_summary
|
||||
assert summary is not None
|
||||
assert summary.job_id == job.id
|
||||
# After first access, the cache should be populated
|
||||
assert hasattr(host, '_latest_summary_cache')
|
||||
|
||||
def test_latest_summary_none_when_no_summaries(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
|
||||
assert host.latest_summary is None
|
||||
|
||||
def test_latest_job_property(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
job = self._run_job(inventory)
|
||||
|
||||
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
|
||||
assert host.latest_job is not None
|
||||
assert host.latest_job.id == job.id
|
||||
|
||||
def test_latest_job_none_when_no_summaries(self):
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
host = Host.objects.filter(inventory=inventory).first()
|
||||
assert host.latest_job is None
|
||||
|
||||
def test_bulk_attach_select_related(self):
|
||||
"""The bulk-attach should select_related job and job__job_template
|
||||
so accessing them doesn't cause extra queries."""
|
||||
inventory = self._create_inventory_with_hosts(3)
|
||||
self._run_job(inventory)
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
|
||||
|
||||
with CaptureQueriesContext(connection) as ctx:
|
||||
for host in hosts:
|
||||
summary = host.latest_summary
|
||||
_ = summary.job # should not query
|
||||
|
||||
assert len(ctx.captured_queries) == 0
|
||||
|
||||
def test_chaining_preserves_annotation(self):
|
||||
"""Chaining .filter() after .with_latest_summary_id() should
|
||||
preserve the annotation and bulk-attach behavior."""
|
||||
inventory = self._create_inventory_with_hosts(5)
|
||||
self._run_job(inventory)
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().filter(name__startswith='host-').order_by('name'))
|
||||
assert len(hosts) == 5
|
||||
for host in hosts:
|
||||
assert hasattr(host, '_latest_summary_cache')
|
||||
assert host._latest_summary_cache is not None
|
||||
|
||||
def test_multiple_jobs_latest_wins(self):
|
||||
"""After multiple jobs, latest_summary should return the most recent."""
|
||||
inventory = self._create_inventory_with_hosts(1)
|
||||
self._run_job(inventory)
|
||||
self._run_job(inventory)
|
||||
job3 = self._run_job(inventory)
|
||||
|
||||
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
|
||||
assert host.latest_summary.job_id == job3.id
|
||||
|
||||
def test_partial_host_coverage(self):
|
||||
"""When a job only touches some hosts, only those hosts get summaries."""
|
||||
inventory = self._create_inventory_with_hosts(5)
|
||||
self._run_job(inventory, host_names=['host-0', 'host-1'])
|
||||
|
||||
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().order_by('name'))
|
||||
with_summary = [h for h in hosts if h.latest_summary is not None]
|
||||
without_summary = [h for h in hosts if h.latest_summary is None]
|
||||
|
||||
assert len(with_summary) == 2
|
||||
assert len(without_summary) == 3
|
||||
assert sorted([h.name for h in with_summary]) == ['host-0', 'host-1']
|
||||
111
awx/main/tests/functional/models/test_host_summary_fields.py
Normal file
111
awx/main/tests/functional/models/test_host_summary_fields.py
Normal file
@@ -0,0 +1,111 @@
|
||||
import pytest
|
||||
|
||||
from django.utils.timezone import now
|
||||
|
||||
from awx.main.models import Job, JobEvent, JobTemplate, Inventory, Host, JobHostSummary, Project
|
||||
from awx.api.serializers import HostSerializer
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
class TestHostSummaryFields:
|
||||
"""Tests for summary_fields of last_job and last_job_host_summary on HostSerializer."""
|
||||
|
||||
def _setup_host_with_job(self, status='canceled'):
|
||||
inventory = Inventory()
|
||||
inventory.save()
|
||||
host = Host(created=now(), modified=now(), name='test-host', inventory=inventory)
|
||||
host.save()
|
||||
|
||||
project = Project(name='test-project')
|
||||
project.save()
|
||||
jt = JobTemplate(name='test-jt', inventory=inventory, project=project)
|
||||
jt.save()
|
||||
|
||||
job = Job(inventory=inventory, job_template=jt, status=status)
|
||||
if status in ('successful', 'failed', 'canceled', 'error'):
|
||||
job.finished = now()
|
||||
if status == 'canceled':
|
||||
job.canceled_on = now()
|
||||
job.save()
|
||||
|
||||
host_map = {host.name: host.id}
|
||||
JobEvent.create_from_data(
|
||||
job_id=job.pk,
|
||||
parent_uuid='abc123',
|
||||
event='playbook_on_stats',
|
||||
event_data={
|
||||
'ok': {host.name: 1},
|
||||
'changed': {},
|
||||
'dark': {},
|
||||
'failures': {},
|
||||
'ignored': {},
|
||||
'processed': {},
|
||||
'rescued': {},
|
||||
'skipped': {},
|
||||
},
|
||||
host_map=host_map,
|
||||
).save()
|
||||
|
||||
summary = JobHostSummary.objects.filter(host=host, job=job).first()
|
||||
host.last_job = job
|
||||
host.last_job_host_summary = summary
|
||||
host.save(update_fields=['last_job', 'last_job_host_summary'])
|
||||
host.refresh_from_db()
|
||||
|
||||
return host, job, summary
|
||||
|
||||
def test_last_job_summary_fields_canceled_job(self):
|
||||
host, job, summary = self._setup_host_with_job(status='canceled')
|
||||
|
||||
serializer = HostSerializer()
|
||||
d = serializer.get_summary_fields(host)
|
||||
|
||||
assert 'last_job' in d
|
||||
last_job = d['last_job']
|
||||
|
||||
expected_keys = {'id', 'name', 'description', 'finished', 'status', 'failed', 'canceled_on', 'job_template_id', 'job_template_name'}
|
||||
assert set(last_job.keys()) == expected_keys, f"Unexpected last_job keys: {set(last_job.keys())}"
|
||||
assert last_job['id'] == job.id
|
||||
assert last_job['status'] == 'canceled'
|
||||
assert last_job['canceled_on'] == job.canceled_on
|
||||
assert last_job['job_template_id'] == job.job_template.id
|
||||
assert last_job['job_template_name'] == job.job_template.name
|
||||
|
||||
def test_last_job_summary_fields_successful_job(self):
|
||||
host, job, summary = self._setup_host_with_job(status='successful')
|
||||
|
||||
serializer = HostSerializer()
|
||||
d = serializer.get_summary_fields(host)
|
||||
|
||||
assert 'last_job' in d
|
||||
last_job = d['last_job']
|
||||
|
||||
expected_keys = {'id', 'name', 'description', 'finished', 'status', 'failed', 'job_template_id', 'job_template_name'}
|
||||
assert set(last_job.keys()) == expected_keys, f"Unexpected last_job keys: {set(last_job.keys())}"
|
||||
assert last_job['id'] == job.id
|
||||
assert last_job['status'] == 'successful'
|
||||
assert 'canceled_on' not in last_job, "canceled_on should not appear when None"
|
||||
|
||||
def test_last_job_host_summary_fields(self):
|
||||
host, job, summary = self._setup_host_with_job(status='successful')
|
||||
|
||||
serializer = HostSerializer()
|
||||
d = serializer.get_summary_fields(host)
|
||||
|
||||
assert 'last_job_host_summary' in d
|
||||
last_jhs = d['last_job_host_summary']
|
||||
|
||||
assert last_jhs['id'] == summary.id
|
||||
assert 'failed' in last_jhs
|
||||
|
||||
def test_no_summary_fields_without_job(self):
|
||||
inventory = Inventory()
|
||||
inventory.save()
|
||||
host = Host(created=now(), modified=now(), name='lonely-host', inventory=inventory)
|
||||
host.save()
|
||||
|
||||
serializer = HostSerializer()
|
||||
d = serializer.get_summary_fields(host)
|
||||
|
||||
assert 'last_job' not in d
|
||||
assert 'last_job_host_summary' not in d
|
||||
Reference in New Issue
Block a user