AAP-68024 perf: derive last_job_host_summary from query instead of denormalized FK (#16332)

* perf: stop eagerly updating Host.last_job_host_summary on every job completion

The playbook_on_stats wrapup path bulk-updates last_job_host_summary_id
on every host touched by a job. In the Q4CY25 scale lab this query had
a median execution time of 75 seconds due to index churn on main_host.

Replace all reads of the denormalized FK with a new classmethod
JobHostSummary.latest_for_host(host_id) that queries for the most
recent summary on demand. This eliminates the write-side bulk_update
of last_job_host_summary_id entirely.

Changes:
- Add JobHostSummary.latest_for_host() classmethod
- Serializer: use latest_for_host() instead of obj.last_job_host_summary
- Dashboard view: use subquery instead of FK traversal for failed hosts
- Inventory.update_computed_fields: use subquery for failed host count
- events.py: remove last_job_host_summary_id from bulk_update
- signals.py: simplify _update_host_last_jhs to only update last_job
- access.py/managers.py: remove select_related/defer through the FK

The FK field on Host is left in place for now (removal requires a
migration) but is no longer written to.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix .pk AttributeError, add job_template annotations, annotate host sublists

- Add 'pk' to AnnotatedSummary dynamic type (fixes AttributeError in get_related)
- Add job_template_id and job_template_name to subquery annotations so list
  views include these fields in summary_fields.last_job (matching detail views)
- Traverse job__ FK from JobHostSummary instead of using separate UnifiedJob
  subquery with OuterRef on another annotation (cleaner SQL, avoids alias issue)
- Annotate all host sublist views (InventoryHostsList, GroupHostsList,
  GroupAllHostsList, InventorySourceHostsList) to prevent N+1 queries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update test_events to use JobHostSummary.latest_for_host instead of stale FKs

Tests were asserting host.last_job_id and host.last_job_host_summary_id
which are no longer updated. Use JobHostSummary.latest_for_host() to
derive the same data, matching the new read-time derivation approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove stale failures_url from deprecated DashboardView

The failures_url linked to ?last_job_host_summary__failed=True which
filters on the now-stale FK. The dashboard count itself was already
fixed to use a subquery annotation. Since DashboardView is deprecated
and has_active_failures is a SerializerMethodField (not filterable),
remove the failures_url entirely rather than creating a custom filter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply black formatting to changed files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Refactor: replace 10 subquery annotations with bulk prefetch

Instead of annotating every host queryset with 10 correlated subqueries
(summary + job + job_template fields), annotate only _latest_summary_id
and bulk-fetch the full JobHostSummary objects after pagination via
select_related('job', 'job__job_template').

This reduces the SQL from 10 correlated subqueries to 1 subquery + 1 IN
query, addressing review feedback about annotation overhead on host list
views.

- _annotate_host_latest_summary: only annotates _latest_summary_id
- _prefetch_latest_summaries: bulk-fetches and attaches to host objects
- HostSummaryPrefetchMixin: hooks into list() after pagination
- Serializer uses real JobHostSummary objects (no more AnnotatedSummary)
- to_representation always overwrites stale FK values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Refactor: move latest summary to QuerySet._fetch_all + Host.latest_summary

Per review feedback, replace the view-level HostSummaryPrefetchMixin
with a custom QuerySet that bulk-attaches summaries at evaluation time
(like prefetch_related), and a Host.latest_summary property as the
single access point.

- HostLatestSummaryQuerySet: overrides _fetch_all() to bulk-fetch
  JobHostSummary objects with select_related after queryset evaluation
- HostManager now inherits from the custom queryset via from_queryset()
- Host.latest_summary property: uses cache if available, falls back to
  individual query
- Remove _annotate_host_latest_summary, _prefetch_latest_summaries,
  HostSummaryPrefetchMixin from views — no more list() override needed
- Remove last_job/last_job_host_summary from SUMMARIZABLE_FK_FIELDS
- Serializer uses obj.latest_summary and DEFAULT_SUMMARY_FIELDS loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: scope annotation to views, restore license_error/canceled_on

- Remove with_latest_summary_id() from HostManager.get_queryset() to
  avoid applying the correlated subquery to every Host query globally
  (count, exists, internal relations)
- Apply with_latest_summary_id() in get_queryset() of the 6
  host-serving views only
- Restore license_error and canceled_on to last_job summary fields
  to avoid breaking API change

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guard _fetch_all() to skip bulk-attach on non-annotated querysets

Without this guard, _fetch_all() would set _latest_summary_cache=None
on every host in non-annotated querysets (e.g. Host.objects.filter()),
masking the per-object fallback query in Host.latest_summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove name from last_job_host_summary and canceled_on from last_job summary

Per reviewer feedback: these fields were not in the original API contract
via SUMMARIZABLE_FK_FIELDS and their addition would be an API change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add functional tests for HostLatestSummaryQuerySet and Host.latest_summary

Tests cover:
- with_latest_summary_id() annotation and most-recent selection
- _fetch_all() bulk-attach behavior on annotated querysets
- _fetch_all() skips non-annotated querysets (preserves fallback)
- .count() and .exists() do NOT trigger _fetch_all
- Host.latest_summary cache hits (zero queries) and fallback
- Host.latest_job property
- select_related on bulk-attached summaries (no N+1)
- Chaining preserves annotation
- Multiple jobs / partial host coverage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply black formatting to test_host_queryset.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>

* Fix flake8 F841: remove unused job1/job2 variables in tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>

* Add comment explaining why Prefetch was not used for host latest summary

Django Prefetch cannot handle latest per group -- [:1] slicing fetches
1 record globally, not per host (Django ticket #26780). The custom
_fetch_all override uses the same 2-query pattern as prefetch_related
internally, customized for this use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix null handling to keep old behavior

---------

Signed-off-by: Ben Thomasson <bthomass@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: AlanCoding <arominge@redhat.com>
This commit is contained in:
Ben Thomasson
2026-04-28 10:47:22 -04:00
committed by GitHub
parent f3b7d442c3
commit d1b3ae53ae
11 changed files with 490 additions and 100 deletions

View File

@@ -174,8 +174,8 @@ SUMMARIZABLE_FK_FIELDS = {
'workflow_approval': DEFAULT_SUMMARY_FIELDS + ('timeout',), 'workflow_approval': DEFAULT_SUMMARY_FIELDS + ('timeout',),
'schedule': DEFAULT_SUMMARY_FIELDS + ('next_run',), 'schedule': DEFAULT_SUMMARY_FIELDS + ('next_run',),
'unified_job_template': DEFAULT_SUMMARY_FIELDS + ('unified_job_type',), 'unified_job_template': DEFAULT_SUMMARY_FIELDS + ('unified_job_type',),
'last_job': DEFAULT_SUMMARY_FIELDS + ('finished', 'status', 'failed', 'license_error', 'canceled_on'), # last_job and last_job_host_summary are derived from JobHostSummary in HostSerializer,
'last_job_host_summary': DEFAULT_SUMMARY_FIELDS + ('failed',), # not from the stale FK fields on Host.
'last_update': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'), 'last_update': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'),
'current_update': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'), 'current_update': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'),
'current_job': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'), 'current_job': DEFAULT_SUMMARY_FIELDS + ('status', 'failed', 'license_error'),
@@ -1837,19 +1837,35 @@ class HostSerializer(BaseSerializerWithVariables):
res['ansible_facts'] = self.reverse('api:host_ansible_facts_detail', kwargs={'pk': obj.instance_id}) res['ansible_facts'] = self.reverse('api:host_ansible_facts_detail', kwargs={'pk': obj.instance_id})
if obj.inventory: if obj.inventory:
res['inventory'] = self.reverse('api:inventory_detail', kwargs={'pk': obj.inventory.pk}) res['inventory'] = self.reverse('api:inventory_detail', kwargs={'pk': obj.inventory.pk})
if obj.last_job: last_summary = obj.latest_summary
res['last_job'] = self.reverse('api:job_detail', kwargs={'pk': obj.last_job.pk}) if last_summary:
if obj.last_job_host_summary: res['last_job_host_summary'] = self.reverse('api:job_host_summary_detail', kwargs={'pk': last_summary.pk})
res['last_job_host_summary'] = self.reverse('api:job_host_summary_detail', kwargs={'pk': obj.last_job_host_summary.pk}) if last_summary.job_id:
res['last_job'] = self.reverse('api:job_detail', kwargs={'pk': last_summary.job_id})
return res return res
def get_summary_fields(self, obj): def get_summary_fields(self, obj):
d = super(HostSerializer, self).get_summary_fields(obj) d = super(HostSerializer, self).get_summary_fields(obj)
try: last_summary = obj.latest_summary
d['last_job']['job_template_id'] = obj.last_job.job_template.id if last_summary:
d['last_job']['job_template_name'] = obj.last_job.job_template.name d['last_job_host_summary'] = OrderedDict()
except (KeyError, AttributeError): d['last_job_host_summary']['id'] = last_summary.id
pass d['last_job_host_summary']['failed'] = last_summary.failed
try:
last_job = last_summary.job
d['last_job'] = OrderedDict()
for field in DEFAULT_SUMMARY_FIELDS + ('finished', 'status', 'failed', 'canceled_on'):
fval = getattr(last_job, field, None)
if fval is not None:
d['last_job'][field] = fval
if last_job.job_template:
d['last_job']['job_template_id'] = last_job.job_template.id
d['last_job']['job_template_name'] = last_job.job_template.name
except ObjectDoesNotExist:
pass
else:
d.pop('last_job', None)
d.pop('last_job_host_summary', None)
if has_model_field_prefetched(obj, 'groups'): if has_model_field_prefetched(obj, 'groups'):
group_list = sorted([{'id': g.id, 'name': g.name} for g in obj.groups.all()], key=lambda x: x['id'])[:5] group_list = sorted([{'id': g.id, 'name': g.name} for g in obj.groups.all()], key=lambda x: x['id'])[:5]
else: else:
@@ -1924,14 +1940,16 @@ class HostSerializer(BaseSerializerWithVariables):
return ret return ret
if 'inventory' in ret and not obj.inventory: if 'inventory' in ret and not obj.inventory:
ret['inventory'] = None ret['inventory'] = None
if 'last_job' in ret and not obj.last_job: last_summary = obj.latest_summary
ret['last_job'] = None if 'last_job' in ret:
if 'last_job_host_summary' in ret and not obj.last_job_host_summary: ret['last_job'] = last_summary.job_id if last_summary else None
ret['last_job_host_summary'] = None if 'last_job_host_summary' in ret:
ret['last_job_host_summary'] = last_summary.pk if last_summary else None
return ret return ret
def get_has_active_failures(self, obj): def get_has_active_failures(self, obj):
return bool(obj.last_job_host_summary and obj.last_job_host_summary.failed) last_summary = obj.latest_summary
return bool(last_summary and last_summary.failed)
def get_has_inventory_sources(self, obj): def get_has_inventory_sources(self, obj):
return obj.inventory_sources.exists() return obj.inventory_sources.exists()

View File

@@ -21,7 +21,7 @@ from urllib3.exceptions import ConnectTimeoutError
# Django # Django
from django.conf import settings from django.conf import settings
from django.core.exceptions import FieldError, ObjectDoesNotExist from django.core.exceptions import FieldError, ObjectDoesNotExist
from django.db.models import Q, Sum, Count from django.db.models import Q, Sum, Count, Subquery, OuterRef
from django.db import IntegrityError, ProgrammingError, transaction, connection from django.db import IntegrityError, ProgrammingError, transaction, connection
from django.db.models.fields.related import ManyToManyField, ForeignKey from django.db.models.fields.related import ManyToManyField, ForeignKey
from django.db.models.functions import Trunc from django.db.models.functions import Trunc
@@ -210,10 +210,10 @@ class DashboardView(APIView):
data['groups'] = {'url': reverse('api:group_list', request=request), 'total': user_groups.count(), 'inventory_failed': groups_inventory_failed} data['groups'] = {'url': reverse('api:group_list', request=request), 'total': user_groups.count(), 'inventory_failed': groups_inventory_failed}
user_hosts = get_user_queryset(request.user, models.Host) user_hosts = get_user_queryset(request.user, models.Host)
user_hosts_failed = user_hosts.filter(last_job_host_summary__failed=True) latest_summary_failed = Subquery(models.JobHostSummary.objects.filter(host_id=OuterRef('pk')).order_by('-id').values('failed')[:1])
user_hosts_failed = user_hosts.annotate(_latest_failed=latest_summary_failed).filter(_latest_failed=True)
data['hosts'] = { data['hosts'] = {
'url': reverse('api:host_list', request=request), 'url': reverse('api:host_list', request=request),
'failures_url': reverse('api:host_list', request=request) + "?last_job_host_summary__failed=True",
'total': user_hosts.count(), 'total': user_hosts.count(),
'failed': user_hosts_failed.count(), 'failed': user_hosts_failed.count(),
} }
@@ -1943,7 +1943,7 @@ class HostList(HostRelatedSearchMixin, ListCreateAPIView):
if filter_string: if filter_string:
filter_qs = SmartFilter.query_from_string(filter_string) filter_qs = SmartFilter.query_from_string(filter_string)
qs &= filter_qs qs &= filter_qs
return qs.distinct() return qs.distinct().with_latest_summary_id()
def list(self, *args, **kwargs): def list(self, *args, **kwargs):
try: try:
@@ -1958,6 +1958,9 @@ class HostDetail(RelatedJobsPreventDeleteMixin, RetrieveUpdateDestroyAPIView):
serializer_class = serializers.HostSerializer serializer_class = serializers.HostSerializer
resource_purpose = 'host detail' resource_purpose = 'host detail'
def get_queryset(self):
return super().get_queryset().with_latest_summary_id()
@extend_schema_if_available(extensions={"x-ai-description": "Delete a host"}) @extend_schema_if_available(extensions={"x-ai-description": "Delete a host"})
def delete(self, request, *args, **kwargs): def delete(self, request, *args, **kwargs):
if self.get_object().inventory.pending_deletion: if self.get_object().inventory.pending_deletion:
@@ -1991,6 +1994,9 @@ class InventoryHostsList(HostRelatedSearchMixin, SubListCreateAttachDetachAPIVie
filter_read_permission = False filter_read_permission = False
resource_purpose = 'hosts of an inventory' resource_purpose = 'hosts of an inventory'
def get_queryset(self):
return super().get_queryset().with_latest_summary_id()
class HostGroupsList(SubListCreateAttachDetachAPIView): class HostGroupsList(SubListCreateAttachDetachAPIView):
'''the list of groups a host is directly a member of''' '''the list of groups a host is directly a member of'''
@@ -2174,6 +2180,9 @@ class GroupHostsList(HostRelatedSearchMixin, SubListCreateAttachDetachAPIView):
relationship = 'hosts' relationship = 'hosts'
resource_purpose = 'hosts of a group' resource_purpose = 'hosts of a group'
def get_queryset(self):
return super().get_queryset().with_latest_summary_id()
def update_raw_data(self, data): def update_raw_data(self, data):
data.pop('inventory', None) data.pop('inventory', None)
return super(GroupHostsList, self).update_raw_data(data) return super(GroupHostsList, self).update_raw_data(data)
@@ -2205,7 +2214,7 @@ class GroupAllHostsList(HostRelatedSearchMixin, SubListAPIView):
self.check_parent_access(parent) self.check_parent_access(parent)
qs = self.request.user.get_queryset(self.model).distinct() # need distinct for '&' operator qs = self.request.user.get_queryset(self.model).distinct() # need distinct for '&' operator
sublist_qs = parent.all_hosts.distinct() sublist_qs = parent.all_hosts.distinct()
return qs & sublist_qs return (qs & sublist_qs).with_latest_summary_id()
class GroupInventorySourcesList(SubListAPIView): class GroupInventorySourcesList(SubListAPIView):
@@ -2498,6 +2507,9 @@ class InventorySourceHostsList(HostRelatedSearchMixin, SubListDestroyAPIView):
check_sub_obj_permission = False check_sub_obj_permission = False
resource_purpose = 'hosts of an inventory source' resource_purpose = 'hosts of an inventory source'
def get_queryset(self):
return super().get_queryset().with_latest_summary_id()
def perform_list_destroy(self, instance_list): def perform_list_destroy(self, instance_list):
inv_source = self.get_parent_object() inv_source = self.get_parent_object()
with ignore_inventory_computed_fields(): with ignore_inventory_computed_fields():

View File

@@ -897,8 +897,6 @@ class HostAccess(BaseAccess):
'created_by', 'created_by',
'modified_by', 'modified_by',
'inventory', 'inventory',
'last_job__job_template',
'last_job_host_summary__job',
) )
prefetch_related = ('groups', 'inventory_sources') prefetch_related = ('groups', 'inventory_sources')

View File

@@ -5,6 +5,7 @@ import logging
import uuid import uuid
from django.db import models from django.db import models
from django.conf import settings from django.conf import settings
from django.db.models import OuterRef, Subquery
from django.db.models.functions import Lower from django.db.models.functions import Lower
from ansible_base.lib.utils.db import advisory_lock from ansible_base.lib.utils.db import advisory_lock
@@ -23,7 +24,65 @@ class DeferJobCreatedManager(models.Manager):
return super(DeferJobCreatedManager, self).get_queryset().defer('job_created') return super(DeferJobCreatedManager, self).get_queryset().defer('job_created')
class HostManager(models.Manager): class HostLatestSummaryQuerySet(models.QuerySet):
"""Queryset that annotates and bulk-attaches the latest JobHostSummary
at queryset evaluation time, similar to prefetch_related().
Why not use Django's Prefetch?
Django's Prefetch with [:1] slicing fetches 1 record globally, not per-host
(Django ticket #26780). Window-function workarounds require Django 4.2+ and
are more complex. Prefetching all summaries then filtering in Python wastes
memory for hosts with many job runs. The approach here — annotate the latest
ID via Subquery, then in_bulk() only those IDs — is the same 2-query pattern
prefetch_related uses internally, customized for "latest per group."
Not streaming-safe: relies on _result_cache existing after _fetch_all().
"""
_awx_latest_summary_attached = False
def _clone(self):
clone = super()._clone()
clone._awx_latest_summary_attached = self._awx_latest_summary_attached
return clone
def with_latest_summary_id(self):
from awx.main.models.jobs import JobHostSummary
latest_summary = JobHostSummary.objects.filter(host_id=OuterRef('pk')).order_by('-id')
return self.annotate(
_latest_summary_id=Subquery(latest_summary.values('id')[:1]),
)
def _fetch_all(self):
super()._fetch_all()
if self._awx_latest_summary_attached or not self._result_cache:
return
# Only bulk-attach if the queryset was annotated via with_latest_summary_id().
# Without this guard, we'd set _latest_summary_cache=None on every host,
# masking the per-object fallback query in Host.latest_summary.
if not hasattr(self._result_cache[0], '_latest_summary_id'):
return
from awx.main.models.jobs import JobHostSummary
latest_summary_ids = [host._latest_summary_id for host in self._result_cache if host._latest_summary_id is not None]
if latest_summary_ids:
summaries_by_id = JobHostSummary.objects.select_related('job', 'job__job_template').in_bulk(latest_summary_ids)
else:
summaries_by_id = {}
for host in self._result_cache:
latest_summary_id = getattr(host, '_latest_summary_id', None)
host._latest_summary_cache = summaries_by_id.get(latest_summary_id)
self._awx_latest_summary_attached = True
class HostManager(models.Manager.from_queryset(HostLatestSummaryQuerySet)):
"""Custom manager class for Hosts model.""" """Custom manager class for Hosts model."""
def active_count(self): def active_count(self):
@@ -53,16 +112,7 @@ class HostManager(models.Manager):
"""When the parent instance of the host query set has a `kind=smart` and a `host_filter` """When the parent instance of the host query set has a `kind=smart` and a `host_filter`
set. Use the `host_filter` to generate the queryset for the hosts. set. Use the `host_filter` to generate the queryset for the hosts.
""" """
qs = ( qs = super().get_queryset()
super(HostManager, self)
.get_queryset()
.defer(
'last_job__extra_vars',
'last_job_host_summary__job__extra_vars',
'last_job__artifacts',
'last_job_host_summary__job__artifacts',
)
)
if hasattr(self, 'instance') and hasattr(self.instance, 'host_filter') and hasattr(self.instance, 'kind'): if hasattr(self, 'instance') and hasattr(self.instance, 'host_filter') and hasattr(self.instance, 'kind'):
if self.instance.kind == 'smart' and self.instance.host_filter is not None: if self.instance.kind == 'smart' and self.instance.host_filter is not None:

View File

@@ -24,7 +24,6 @@ from awx.main.managers import DeferJobCreatedManager
from awx.main.constants import MINIMAL_EVENTS from awx.main.constants import MINIMAL_EVENTS
from awx.main.models.base import CreatedModifiedModel from awx.main.models.base import CreatedModifiedModel
from awx.main.utils import ignore_inventory_computed_fields, camelcase_to_underscore from awx.main.utils import ignore_inventory_computed_fields, camelcase_to_underscore
from awx.main.utils.db import bulk_update_sorted_by_id
analytics_logger = logging.getLogger('awx.analytics.job_events') analytics_logger = logging.getLogger('awx.analytics.job_events')
@@ -590,20 +589,8 @@ class JobEvent(BasePlaybookEvent):
JobHostSummary.objects.bulk_create(summaries.values()) JobHostSummary.objects.bulk_create(summaries.values())
# update the last_job_id and last_job_host_summary_id # last_job and last_job_host_summary are now derived via
# in single queries # JobHostSummary.latest_for_host / latest_job_for_host
host_mapping = dict((summary['host_id'], summary['id']) for summary in JobHostSummary.objects.filter(job_id=job.id).values('id', 'host_id'))
updated_hosts = set()
for h in all_hosts:
# if the hostname *shows up* in the playbook_on_stats event
if h.name in hostnames:
h.last_job_id = job.id
updated_hosts.add(h)
if h.id in host_mapping:
h.last_job_host_summary_id = host_mapping[h.id]
updated_hosts.add(h)
bulk_update_sorted_by_id(Host, updated_hosts, ['last_job_id', 'last_job_host_summary_id'])
# Create/update Host Metrics # Create/update Host Metrics
self._update_host_metrics(updated_hosts_list) self._update_host_metrics(updated_hosts_list)

View File

@@ -18,7 +18,7 @@ from django.db import transaction
from django.core.exceptions import ValidationError from django.core.exceptions import ValidationError
from django.urls import resolve from django.urls import resolve
from django.utils.timezone import now from django.utils.timezone import now
from django.db.models import Q from django.db.models import Q, Subquery, OuterRef
# REST Framework # REST Framework
from rest_framework.exceptions import ParseError from rest_framework.exceptions import ParseError
@@ -386,7 +386,10 @@ class Inventory(CommonModelNameNotUnique, ResourceMixin, RelatedJobsMixin, OpaQu
logger.debug("Going to update inventory computed fields, pk={0}".format(self.pk)) logger.debug("Going to update inventory computed fields, pk={0}".format(self.pk))
start_time = time.time() start_time = time.time()
active_hosts = self.hosts active_hosts = self.hosts
failed_hosts = active_hosts.filter(last_job_host_summary__failed=True) from awx.main.models.jobs import JobHostSummary # circular import: inventory.py loads before jobs.py
latest_summary_failed = Subquery(JobHostSummary.objects.filter(host_id=OuterRef('pk')).order_by('-id').values('failed')[:1])
failed_hosts = active_hosts.annotate(_latest_failed=latest_summary_failed).filter(_latest_failed=True)
active_groups = self.groups active_groups = self.groups
if self.kind == 'smart': if self.kind == 'smart':
active_groups = active_groups.none() active_groups = active_groups.none()
@@ -582,6 +585,23 @@ class Host(CommonModelNameNotUnique, RelatedJobsMixin):
objects = HostManager() objects = HostManager()
@property
def latest_summary(self):
if hasattr(self, '_latest_summary_cache'):
return self._latest_summary_cache
from awx.main.models.jobs import JobHostSummary
summary = JobHostSummary.objects.filter(host_id=self.pk).order_by('-id').select_related('job', 'job__job_template').first()
self._latest_summary_cache = summary
return summary
@property
def latest_job(self):
summary = self.latest_summary
if summary is None:
return None
return summary.job
def get_absolute_url(self, request=None): def get_absolute_url(self, request=None):
return reverse('api:host_detail', kwargs={'pk': self.pk}, request=request) return reverse('api:host_detail', kwargs={'pk': self.pk}, request=request)

View File

@@ -1140,6 +1140,22 @@ class JobHostSummary(CreatedModifiedModel):
self.skipped, self.skipped,
) )
@classmethod
def latest_for_host(cls, host_id):
"""Return the most recent JobHostSummary for a given host, or None."""
return cls.objects.filter(host_id=host_id).order_by('-id').first()
@classmethod
def latest_job_for_host(cls, host_id):
"""Return the Job from the most recent JobHostSummary for a host, or None."""
summary = cls.latest_for_host(host_id)
if summary:
try:
return summary.job
except cls.job.field.related_model.DoesNotExist:
return None
return None
def get_absolute_url(self, request=None): def get_absolute_url(self, request=None):
return reverse('api:job_host_summary_detail', kwargs={'pk': self.pk}, request=request) return reverse('api:job_host_summary_detail', kwargs={'pk': self.pk}, request=request)

View File

@@ -36,7 +36,6 @@ from awx.main.models import (
Inventory, Inventory,
InventorySource, InventorySource,
Job, Job,
JobHostSummary,
Organization, Organization,
Project, Project,
Role, Role,
@@ -251,45 +250,9 @@ def migrate_children_from_deleted_group_to_parent_groups(sender, **kwargs):
pass pass
# Update host pointers to last_job and last_job_host_summary when a job is deleted # Host.last_job and Host.last_job_host_summary are now derived from
# JobHostSummary.latest_for_host / latest_job_for_host.
# No signal handlers needed to maintain these denormalized FKs.
def _update_host_last_jhs(host):
jhs_qs = JobHostSummary.objects.filter(host__pk=host.pk)
try:
jhs = jhs_qs.order_by('-job__pk')[0]
except IndexError:
jhs = None
update_fields = []
try:
last_job = jhs.job if jhs else None
except Job.DoesNotExist:
# The job (and its summaries) have already been/are currently being
# deleted, so there's no need to update the host w/ a reference to it
return
if host.last_job != last_job:
host.last_job = last_job
update_fields.append('last_job')
if host.last_job_host_summary != jhs:
host.last_job_host_summary = jhs
update_fields.append('last_job_host_summary')
if update_fields:
host.save(update_fields=update_fields)
@receiver(pre_delete, sender=Job)
def save_host_pks_before_job_delete(sender, **kwargs):
instance = kwargs['instance']
hosts_qs = Host.objects.filter(last_job__pk=instance.pk)
instance._saved_hosts_pks = set(hosts_qs.values_list('pk', flat=True))
@receiver(post_delete, sender=Job)
def update_host_last_job_after_job_deleted(sender, **kwargs):
instance = kwargs['instance']
hosts_pks = getattr(instance, '_saved_hosts_pks', [])
for host in Host.objects.filter(pk__in=hosts_pks):
_update_host_last_jhs(host)
# Set via ActivityStreamRegistrar to record activity stream events # Set via ActivityStreamRegistrar to record activity stream events

View File

@@ -71,8 +71,10 @@ class TestEvents:
assert s.skipped == 0 assert s.skipped == 0
for host in Host.objects.all(): for host in Host.objects.all():
assert host.last_job_id == self.job.id latest_summary = JobHostSummary.latest_for_host(host.id)
assert host.last_job_host_summary.host == host assert latest_summary is not None
assert latest_summary.job_id == self.job.id
assert latest_summary.host == host
def test_host_summary_generation_with_deleted_hosts(self): def test_host_summary_generation_with_deleted_hosts(self):
self._generate_hosts(10) self._generate_hosts(10)
@@ -91,8 +93,7 @@ class TestEvents:
def test_host_summary_generation_with_limit(self): def test_host_summary_generation_with_limit(self):
# Make an inventory with 10 hosts, run a playbook with a --limit # Make an inventory with 10 hosts, run a playbook with a --limit
# pointed at *one* host, # pointed at *one* host,
# Verify that *only* that host has an associated JobHostSummary and that # Verify that *only* that host has an associated JobHostSummary.
# *only* that host has an updated value for .last_job.
self._generate_hosts(10) self._generate_hosts(10)
# by making the playbook_on_stats *only* include Host 1, we're emulating # by making the playbook_on_stats *only* include Host 1, we're emulating
@@ -105,13 +106,14 @@ class TestEvents:
# be related to the appropriate Host) # be related to the appropriate Host)
assert JobHostSummary.objects.count() == 1 assert JobHostSummary.objects.count() == 1
for h in Host.objects.all(): for h in Host.objects.all():
latest_summary = JobHostSummary.latest_for_host(h.id)
if h.name == 'Host 1': if h.name == 'Host 1':
assert h.last_job_id == self.job.id assert latest_summary is not None
assert h.last_job_host_summary_id == JobHostSummary.objects.first().id assert latest_summary.job_id == self.job.id
assert latest_summary.id == JobHostSummary.objects.first().id
else: else:
# all other hosts in the inventory should remain untouched # all other hosts in the inventory should have no summary
assert h.last_job_id is None assert latest_summary is None
assert h.last_job_host_summary_id is None
def test_host_metrics_insert(self): def test_host_metrics_insert(self):
self._generate_hosts(10) self._generate_hosts(10)

View File

@@ -0,0 +1,213 @@
import pytest
from django.test.utils import CaptureQueriesContext
from django.db import connection
from django.utils.timezone import now
from awx.main.models import Job, JobEvent, Inventory, Host, JobHostSummary
@pytest.mark.django_db
class TestHostLatestSummaryQuerySet:
"""Tests for HostLatestSummaryQuerySet and Host.latest_summary property."""
def _create_inventory_with_hosts(self, count=5):
inventory = Inventory()
inventory.save()
Host.objects.bulk_create([Host(created=now(), modified=now(), name=f'host-{i}', inventory_id=inventory.id) for i in range(count)])
return inventory
def _run_job(self, inventory, host_names=None):
"""Run a fake job that creates JobHostSummary records for the given hosts."""
if host_names is None:
host_names = list(inventory.hosts.values_list('name', flat=True))
job = Job(inventory=inventory)
job.save()
host_map = dict(inventory.hosts.values_list('name', 'id'))
JobEvent.create_from_data(
job_id=job.pk,
parent_uuid='abc123',
event='playbook_on_stats',
event_data={
'ok': {name: 1 for name in host_names},
'changed': {},
'dark': {},
'failures': {},
'ignored': {},
'processed': {},
'rescued': {},
'skipped': {},
},
host_map=host_map,
).save()
return job
def test_with_latest_summary_id_annotates_hosts(self):
inventory = self._create_inventory_with_hosts(3)
job = self._run_job(inventory)
hosts = Host.objects.filter(inventory=inventory).with_latest_summary_id()
for host in hosts:
assert hasattr(host, '_latest_summary_id')
summary = JobHostSummary.objects.filter(host=host, job=job).first()
assert host._latest_summary_id == summary.id
def test_with_latest_summary_id_returns_most_recent(self):
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
job2 = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
latest = JobHostSummary.objects.filter(host_id=host.id).order_by('-id').first()
assert latest.job_id == job2.id
assert host._latest_summary_id == latest.id
def test_with_latest_summary_id_none_for_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
# No job run — no summaries
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host._latest_summary_id is None
def test_fetch_all_bulk_attaches_summaries(self):
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
for host in hosts:
assert hasattr(host, '_latest_summary_cache')
assert host._latest_summary_cache is not None
assert isinstance(host._latest_summary_cache, JobHostSummary)
def test_fetch_all_skips_non_annotated_querysets(self):
"""Non-annotated querysets should NOT set _latest_summary_cache,
preserving the per-object fallback in Host.latest_summary."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory))
for host in hosts:
assert not hasattr(host, '_latest_summary_cache')
def test_count_does_not_trigger_fetch_all(self):
"""Calling .count() should not trigger _fetch_all or the bulk-attach logic."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
with CaptureQueriesContext(connection) as ctx:
result = qs.count()
assert result == 5
# count() should produce a single COUNT query, not fetch all rows + summaries
assert len(ctx.captured_queries) == 1
assert 'COUNT' in ctx.captured_queries[0]['sql'].upper()
def test_exists_does_not_trigger_fetch_all(self):
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
qs = Host.objects.filter(inventory=inventory).with_latest_summary_id()
with CaptureQueriesContext(connection) as ctx:
result = qs.exists()
assert result is True
assert len(ctx.captured_queries) == 1
def test_latest_summary_property_uses_cache(self):
"""When loaded via with_latest_summary_id(), Host.latest_summary
should use the bulk-attached cache without extra queries."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
with CaptureQueriesContext(connection) as ctx:
for host in hosts:
summary = host.latest_summary
assert summary is not None
# No additional queries — all data came from the bulk-attach
assert len(ctx.captured_queries) == 0
def test_latest_summary_property_fallback(self):
"""When loaded without annotation, Host.latest_summary should
fall back to a per-object query."""
inventory = self._create_inventory_with_hosts(1)
job = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).first()
assert not hasattr(host, '_latest_summary_cache')
summary = host.latest_summary
assert summary is not None
assert summary.job_id == job.id
# After first access, the cache should be populated
assert hasattr(host, '_latest_summary_cache')
def test_latest_summary_none_when_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_summary is None
def test_latest_job_property(self):
inventory = self._create_inventory_with_hosts(1)
job = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_job is not None
assert host.latest_job.id == job.id
def test_latest_job_none_when_no_summaries(self):
inventory = self._create_inventory_with_hosts(1)
host = Host.objects.filter(inventory=inventory).first()
assert host.latest_job is None
def test_bulk_attach_select_related(self):
"""The bulk-attach should select_related job and job__job_template
so accessing them doesn't cause extra queries."""
inventory = self._create_inventory_with_hosts(3)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id())
with CaptureQueriesContext(connection) as ctx:
for host in hosts:
summary = host.latest_summary
_ = summary.job # should not query
assert len(ctx.captured_queries) == 0
def test_chaining_preserves_annotation(self):
"""Chaining .filter() after .with_latest_summary_id() should
preserve the annotation and bulk-attach behavior."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory)
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().filter(name__startswith='host-').order_by('name'))
assert len(hosts) == 5
for host in hosts:
assert hasattr(host, '_latest_summary_cache')
assert host._latest_summary_cache is not None
def test_multiple_jobs_latest_wins(self):
"""After multiple jobs, latest_summary should return the most recent."""
inventory = self._create_inventory_with_hosts(1)
self._run_job(inventory)
self._run_job(inventory)
job3 = self._run_job(inventory)
host = Host.objects.filter(inventory=inventory).with_latest_summary_id().first()
assert host.latest_summary.job_id == job3.id
def test_partial_host_coverage(self):
"""When a job only touches some hosts, only those hosts get summaries."""
inventory = self._create_inventory_with_hosts(5)
self._run_job(inventory, host_names=['host-0', 'host-1'])
hosts = list(Host.objects.filter(inventory=inventory).with_latest_summary_id().order_by('name'))
with_summary = [h for h in hosts if h.latest_summary is not None]
without_summary = [h for h in hosts if h.latest_summary is None]
assert len(with_summary) == 2
assert len(without_summary) == 3
assert sorted([h.name for h in with_summary]) == ['host-0', 'host-1']

View File

@@ -0,0 +1,111 @@
import pytest
from django.utils.timezone import now
from awx.main.models import Job, JobEvent, JobTemplate, Inventory, Host, JobHostSummary, Project
from awx.api.serializers import HostSerializer
@pytest.mark.django_db
class TestHostSummaryFields:
"""Tests for summary_fields of last_job and last_job_host_summary on HostSerializer."""
def _setup_host_with_job(self, status='canceled'):
inventory = Inventory()
inventory.save()
host = Host(created=now(), modified=now(), name='test-host', inventory=inventory)
host.save()
project = Project(name='test-project')
project.save()
jt = JobTemplate(name='test-jt', inventory=inventory, project=project)
jt.save()
job = Job(inventory=inventory, job_template=jt, status=status)
if status in ('successful', 'failed', 'canceled', 'error'):
job.finished = now()
if status == 'canceled':
job.canceled_on = now()
job.save()
host_map = {host.name: host.id}
JobEvent.create_from_data(
job_id=job.pk,
parent_uuid='abc123',
event='playbook_on_stats',
event_data={
'ok': {host.name: 1},
'changed': {},
'dark': {},
'failures': {},
'ignored': {},
'processed': {},
'rescued': {},
'skipped': {},
},
host_map=host_map,
).save()
summary = JobHostSummary.objects.filter(host=host, job=job).first()
host.last_job = job
host.last_job_host_summary = summary
host.save(update_fields=['last_job', 'last_job_host_summary'])
host.refresh_from_db()
return host, job, summary
def test_last_job_summary_fields_canceled_job(self):
host, job, summary = self._setup_host_with_job(status='canceled')
serializer = HostSerializer()
d = serializer.get_summary_fields(host)
assert 'last_job' in d
last_job = d['last_job']
expected_keys = {'id', 'name', 'description', 'finished', 'status', 'failed', 'canceled_on', 'job_template_id', 'job_template_name'}
assert set(last_job.keys()) == expected_keys, f"Unexpected last_job keys: {set(last_job.keys())}"
assert last_job['id'] == job.id
assert last_job['status'] == 'canceled'
assert last_job['canceled_on'] == job.canceled_on
assert last_job['job_template_id'] == job.job_template.id
assert last_job['job_template_name'] == job.job_template.name
def test_last_job_summary_fields_successful_job(self):
host, job, summary = self._setup_host_with_job(status='successful')
serializer = HostSerializer()
d = serializer.get_summary_fields(host)
assert 'last_job' in d
last_job = d['last_job']
expected_keys = {'id', 'name', 'description', 'finished', 'status', 'failed', 'job_template_id', 'job_template_name'}
assert set(last_job.keys()) == expected_keys, f"Unexpected last_job keys: {set(last_job.keys())}"
assert last_job['id'] == job.id
assert last_job['status'] == 'successful'
assert 'canceled_on' not in last_job, "canceled_on should not appear when None"
def test_last_job_host_summary_fields(self):
host, job, summary = self._setup_host_with_job(status='successful')
serializer = HostSerializer()
d = serializer.get_summary_fields(host)
assert 'last_job_host_summary' in d
last_jhs = d['last_job_host_summary']
assert last_jhs['id'] == summary.id
assert 'failed' in last_jhs
def test_no_summary_fields_without_job(self):
inventory = Inventory()
inventory.save()
host = Host(created=now(), modified=now(), name='lonely-host', inventory=inventory)
host.save()
serializer = HostSerializer()
d = serializer.get_summary_fields(host)
assert 'last_job' not in d
assert 'last_job_host_summary' not in d