Files
awx/awx/main/tests/functional/models/test_host_queryset.py
Ben Thomasson d1b3ae53ae AAP-68024 perf: derive last_job_host_summary from query instead of denormalized FK (#16332)
* perf: stop eagerly updating Host.last_job_host_summary on every job completion

The playbook_on_stats wrapup path bulk-updates last_job_host_summary_id
on every host touched by a job. In the Q4CY25 scale lab this query had
a median execution time of 75 seconds due to index churn on main_host.

Replace all reads of the denormalized FK with a new classmethod
JobHostSummary.latest_for_host(host_id) that queries for the most
recent summary on demand. This eliminates the write-side bulk_update
of last_job_host_summary_id entirely.

Changes:
- Add JobHostSummary.latest_for_host() classmethod
- Serializer: use latest_for_host() instead of obj.last_job_host_summary
- Dashboard view: use subquery instead of FK traversal for failed hosts
- Inventory.update_computed_fields: use subquery for failed host count
- events.py: remove last_job_host_summary_id from bulk_update
- signals.py: simplify _update_host_last_jhs to only update last_job
- access.py/managers.py: remove select_related/defer through the FK

The FK field on Host is left in place for now (removal requires a
migration) but is no longer written to.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix .pk AttributeError, add job_template annotations, annotate host sublists

- Add 'pk' to AnnotatedSummary dynamic type (fixes AttributeError in get_related)
- Add job_template_id and job_template_name to subquery annotations so list
  views include these fields in summary_fields.last_job (matching detail views)
- Traverse job__ FK from JobHostSummary instead of using separate UnifiedJob
  subquery with OuterRef on another annotation (cleaner SQL, avoids alias issue)
- Annotate all host sublist views (InventoryHostsList, GroupHostsList,
  GroupAllHostsList, InventorySourceHostsList) to prevent N+1 queries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update test_events to use JobHostSummary.latest_for_host instead of stale FKs

Tests were asserting host.last_job_id and host.last_job_host_summary_id
which are no longer updated. Use JobHostSummary.latest_for_host() to
derive the same data, matching the new read-time derivation approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove stale failures_url from deprecated DashboardView

The failures_url linked to ?last_job_host_summary__failed=True which
filters on the now-stale FK. The dashboard count itself was already
fixed to use a subquery annotation. Since DashboardView is deprecated
and has_active_failures is a SerializerMethodField (not filterable),
remove the failures_url entirely rather than creating a custom filter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply black formatting to changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Refactor: replace 10 subquery annotations with bulk prefetch

Instead of annotating every host queryset with 10 correlated subqueries
(summary + job + job_template fields), annotate only _latest_summary_id
and bulk-fetch the full JobHostSummary objects after pagination via
select_related('job', 'job__job_template').

This reduces the SQL from 10 correlated subqueries to 1 subquery + 1 IN
query, addressing review feedback about annotation overhead on host list
views.

- _annotate_host_latest_summary: only annotates _latest_summary_id
- _prefetch_latest_summaries: bulk-fetches and attaches to host objects
- HostSummaryPrefetchMixin: hooks into list() after pagination
- Serializer uses real JobHostSummary objects (no more AnnotatedSummary)
- to_representation always overwrites stale FK values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Refactor: move latest summary to QuerySet._fetch_all + Host.latest_summary

Per review feedback, replace the view-level HostSummaryPrefetchMixin
with a custom QuerySet that bulk-attaches summaries at evaluation time
(like prefetch_related), and a Host.latest_summary property as the
single access point.

- HostLatestSummaryQuerySet: overrides _fetch_all() to bulk-fetch
  JobHostSummary objects with select_related after queryset evaluation
- HostManager now inherits from the custom queryset via from_queryset()
- Host.latest_summary property: uses cache if available, falls back to
  individual query
- Remove _annotate_host_latest_summary, _prefetch_latest_summaries,
  HostSummaryPrefetchMixin from views — no more list() override needed
- Remove last_job/last_job_host_summary from SUMMARIZABLE_FK_FIELDS
- Serializer uses obj.latest_summary and DEFAULT_SUMMARY_FIELDS loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: scope annotation to views, restore license_error/canceled_on

- Remove with_latest_summary_id() from HostManager.get_queryset() to
  avoid applying the correlated subquery to every Host query globally
  (count, exists, internal relations)
- Apply with_latest_summary_id() in get_queryset() of the 6
  host-serving views only
- Restore license_error and canceled_on to last_job summary fields
  to avoid breaking API change

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guard _fetch_all() to skip bulk-attach on non-annotated querysets

Without this guard, _fetch_all() would set _latest_summary_cache=None
on every host in non-annotated querysets (e.g. Host.objects.filter()),
masking the per-object fallback query in Host.latest_summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove name from last_job_host_summary and canceled_on from last_job summary

Per reviewer feedback: these fields were not in the original API contract
via SUMMARIZABLE_FK_FIELDS and their addition would be an API change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add functional tests for HostLatestSummaryQuerySet and Host.latest_summary

Tests cover:
- with_latest_summary_id() annotation and most-recent selection
- _fetch_all() bulk-attach behavior on annotated querysets
- _fetch_all() skips non-annotated querysets (preserves fallback)
- .count() and .exists() do NOT trigger _fetch_all
- Host.latest_summary cache hits (zero queries) and fallback
- Host.latest_job property
- select_related on bulk-attached summaries (no N+1)
- Chaining preserves annotation
- Multiple jobs / partial host coverage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply black formatting to test_host_queryset.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>

* Fix flake8 F841: remove unused job1/job2 variables in tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>

* Add comment explaining why Prefetch was not used for host latest summary

Django Prefetch cannot handle latest per group -- [:1] slicing fetches
1 record globally, not per host (Django ticket #26780). The custom
_fetch_all override uses the same 2-query pattern as prefetch_related
internally, customized for this use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix null handling to keep old behavior

---------

Signed-off-by: Ben Thomasson <bthomass@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: AlanCoding <arominge@redhat.com>
2026-04-28 10:47:22 -04:00

214 lines
8.8 KiB
Python

import pytest
from django.test.utils import CaptureQueriesContext
from django.db import connection
from django.utils.timezone import now
from awx.main.models import Job, JobEvent, Inventory, Host, JobHostSummary
@pytest.mark.django_db
class TestHostLatestSummaryQuerySet:
"""Tests for HostLatestSummaryQuerySet and Host.latest_summary property."""
def _create_inventory_with_hosts(self, count=5):
inventory = Inventory()
inventory.save()
Host.objects.bulk_create([Host(created=now(), modified=now(), name=f'host-{i}', inventory_id=inventory.id) for i in range(count)])
return inventory
def _run_job(self, inventory, host_names=None):
"""Run a fake job that creates JobHostSummary records for the given hosts."""
if host_names is None:
host_names = list(inventory.hosts.values_list('name', flat=True))
job = Job(inventory=inventory)
job.save()
host_map = dict(inventory.hosts.values_list('name', 'id'))
JobEvent.create_from_data(
job_id=job.pk,
parent_uuid='abc123',
event='playbook_on_stats',
event_data={
'ok': {name: 1 for name in host_names},
'changed': {},
'dark': {},
'failures': {},
'ignored': {},
'processed': {},
'rescued': {},
'skipped': {},
},
host_map=host_map,
).save()
return job
def test_with_latest_summary_id_annotates_hosts(self):
inventory = self._create_inventory_with_hosts(3)
job = self._run_job(inventory)
hosts = Host.objects.filter(inventory=inventory).with_latest_summary_id()
for host in hosts:
assert hasattr(host, '_latest_summary_id')
summary = JobHostSummary.objects.filter(host=host, job=job).first()
assert host._latest_summary_id == summary.id
def test_with_latest_summary_id_returns_most_recent(self):
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
job2 = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
latest = JobHostSummary.objects.filter(host_id=host.id).order_by('-id').first()
assert latest.job_id == job2.id
assert host._latest_summary_id == latest.id
def test_with_latest_summary_id_none_for_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
# No job run — no summaries
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host._latest_summary_id is None
def test_fetch_all_bulk_attaches_summaries(self):
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
for host in hosts:
assert hasattr(host, '_latest_summary_cache')
assert host._latest_summary_cache is not None
assert isinstance(host._latest_summary_cache, JobHostSummary)
def test_fetch_all_skips_non_annotated_querysets(self):
"""Non-annotated querysets should NOT set _latest_summary_cache,
preserving the per-object fallback in Host.latest_summary."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory))
for host in hosts:
assert not hasattr(host, '_latest_summary_cache')
def test_count_does_not_trigger_fetch_all(self):
"""Calling .count() should not trigger _fetch_all or the bulk-attach logic."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
with CaptureQueriesContext(connection) as ctx:
result = qs.count()
assert result == 5
# count() should produce a single COUNT query, not fetch all rows + summaries
assert len(ctx.captured_queries) == 1
assert 'COUNT' in ctx.captured_queries[0]['sql'].upper()
def test_exists_does_not_trigger_fetch_all(self):
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
with CaptureQueriesContext(connection) as ctx:
result = qs.exists()
assert result is True
assert len(ctx.captured_queries) == 1
def test_latest_summary_property_uses_cache(self):
"""When loaded via with_latest_summary_id(), Host.latest_summary
should use the bulk-attached cache without extra queries."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
with CaptureQueriesContext(connection) as ctx:
for host in hosts:
summary = host.latest_summary
assert summary is not None
# No additional queries — all data came from the bulk-attach
assert len(ctx.captured_queries) == 0
def test_latest_summary_property_fallback(self):
"""When loaded without annotation, Host.latest_summary should
fall back to a per-object query."""
inventory = self._create_inventory_with_hosts(1)
job = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).first()
assert not hasattr(host, '_latest_summary_cache')
summary = host.latest_summary
assert summary is not None
assert summary.job_id == job.id
# After first access, the cache should be populated
assert hasattr(host, '_latest_summary_cache')
def test_latest_summary_none_when_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_summary is None
def test_latest_job_property(self):
inventory = self._create_inventory_with_hosts(1)
job = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_job is not None
assert host.latest_job.id == job.id
def test_latest_job_none_when_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
host = Host.objects.filter(inventory=inventory).first()
assert host.latest_job is None
def test_bulk_attach_select_related(self):
"""The bulk-attach should select_related job and job__job_template
so accessing them doesn't cause extra queries."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
with CaptureQueriesContext(connection) as ctx:
for host in hosts:
summary = host.latest_summary
_ = summary.job # should not query
assert len(ctx.captured_queries) == 0
def test_chaining_preserves_annotation(self):
"""Chaining .filter() after .with_latest_summary_id() should
preserve the annotation and bulk-attach behavior."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().filter(name__startswith='host-').order_by('name'))
assert len(hosts) == 5
for host in hosts:
assert hasattr(host, '_latest_summary_cache')
assert host._latest_summary_cache is not None
def test_multiple_jobs_latest_wins(self):
"""After multiple jobs, latest_summary should return the most recent."""
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
self._run_job(inventory)
job3 = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_summary.job_id == job3.id
def test_partial_host_coverage(self):
"""When a job only touches some hosts, only those hosts get summaries."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory, host_names=['host-0', 'host-1'])
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().order_by('name'))
with_summary = [h for h in hosts if h.latest_summary is not None]
without_summary = [h for h in hosts if h.latest_summary is None]
assert len(with_summary) == 2
assert len(without_summary) == 3
assert sorted([h.name for h in with_summary]) == ['host-0', 'host-1']