External-Mirrors/awx

mirror of https://github.com/ansible/awx.git synced 2026-04-29 13:45:26 -02:30

Author	SHA1	Message	Date
Peter Braun	17a1a20910	fix: address comments	2026-04-28 17:08:35 +02:00
Ben Thomasson	d1b3ae53ae	AAP-68024 perf: derive last_job_host_summary from query instead of denormalized FK (#16332 ) * perf: stop eagerly updating Host.last_job_host_summary on every job completion The playbook_on_stats wrapup path bulk-updates last_job_host_summary_id on every host touched by a job. In the Q4CY25 scale lab this query had a median execution time of 75 seconds due to index churn on main_host. Replace all reads of the denormalized FK with a new classmethod JobHostSummary.latest_for_host(host_id) that queries for the most recent summary on demand. This eliminates the write-side bulk_update of last_job_host_summary_id entirely. Changes: - Add JobHostSummary.latest_for_host() classmethod - Serializer: use latest_for_host() instead of obj.last_job_host_summary - Dashboard view: use subquery instead of FK traversal for failed hosts - Inventory.update_computed_fields: use subquery for failed host count - events.py: remove last_job_host_summary_id from bulk_update - signals.py: simplify _update_host_last_jhs to only update last_job - access.py/managers.py: remove select_related/defer through the FK The FK field on Host is left in place for now (removal requires a migration) but is no longer written to. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix .pk AttributeError, add job_template annotations, annotate host sublists - Add 'pk' to AnnotatedSummary dynamic type (fixes AttributeError in get_related) - Add job_template_id and job_template_name to subquery annotations so list views include these fields in summary_fields.last_job (matching detail views) - Traverse job__ FK from JobHostSummary instead of using separate UnifiedJob subquery with OuterRef on another annotation (cleaner SQL, avoids alias issue) - Annotate all host sublist views (InventoryHostsList, GroupHostsList, GroupAllHostsList, InventorySourceHostsList) to prevent N+1 queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update test_events to use JobHostSummary.latest_for_host instead of stale FKs Tests were asserting host.last_job_id and host.last_job_host_summary_id which are no longer updated. Use JobHostSummary.latest_for_host() to derive the same data, matching the new read-time derivation approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove stale failures_url from deprecated DashboardView The failures_url linked to ?last_job_host_summary__failed=True which filters on the now-stale FK. The dashboard count itself was already fixed to use a subquery annotation. Since DashboardView is deprecated and has_active_failures is a SerializerMethodField (not filterable), remove the failures_url entirely rather than creating a custom filter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply black formatting to changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor: replace 10 subquery annotations with bulk prefetch Instead of annotating every host queryset with 10 correlated subqueries (summary + job + job_template fields), annotate only _latest_summary_id and bulk-fetch the full JobHostSummary objects after pagination via select_related('job', 'job__job_template'). This reduces the SQL from 10 correlated subqueries to 1 subquery + 1 IN query, addressing review feedback about annotation overhead on host list views. - _annotate_host_latest_summary: only annotates _latest_summary_id - _prefetch_latest_summaries: bulk-fetches and attaches to host objects - HostSummaryPrefetchMixin: hooks into list() after pagination - Serializer uses real JobHostSummary objects (no more AnnotatedSummary) - to_representation always overwrites stale FK values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor: move latest summary to QuerySet._fetch_all + Host.latest_summary Per review feedback, replace the view-level HostSummaryPrefetchMixin with a custom QuerySet that bulk-attaches summaries at evaluation time (like prefetch_related), and a Host.latest_summary property as the single access point. - HostLatestSummaryQuerySet: overrides _fetch_all() to bulk-fetch JobHostSummary objects with select_related after queryset evaluation - HostManager now inherits from the custom queryset via from_queryset() - Host.latest_summary property: uses cache if available, falls back to individual query - Remove _annotate_host_latest_summary, _prefetch_latest_summaries, HostSummaryPrefetchMixin from views — no more list() override needed - Remove last_job/last_job_host_summary from SUMMARIZABLE_FK_FIELDS - Serializer uses obj.latest_summary and DEFAULT_SUMMARY_FIELDS loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: scope annotation to views, restore license_error/canceled_on - Remove with_latest_summary_id() from HostManager.get_queryset() to avoid applying the correlated subquery to every Host query globally (count, exists, internal relations) - Apply with_latest_summary_id() in get_queryset() of the 6 host-serving views only - Restore license_error and canceled_on to last_job summary fields to avoid breaking API change Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guard _fetch_all() to skip bulk-attach on non-annotated querysets Without this guard, _fetch_all() would set _latest_summary_cache=None on every host in non-annotated querysets (e.g. Host.objects.filter()), masking the per-object fallback query in Host.latest_summary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove name from last_job_host_summary and canceled_on from last_job summary Per reviewer feedback: these fields were not in the original API contract via SUMMARIZABLE_FK_FIELDS and their addition would be an API change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add functional tests for HostLatestSummaryQuerySet and Host.latest_summary Tests cover: - with_latest_summary_id() annotation and most-recent selection - _fetch_all() bulk-attach behavior on annotated querysets - _fetch_all() skips non-annotated querysets (preserves fallback) - .count() and .exists() do NOT trigger _fetch_all - Host.latest_summary cache hits (zero queries) and fallback - Host.latest_job property - select_related on bulk-attached summaries (no N+1) - Chaining preserves annotation - Multiple jobs / partial host coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply black formatting to test_host_queryset.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Thomasson <bthomass@redhat.com> * Fix flake8 F841: remove unused job1/job2 variables in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Thomasson <bthomass@redhat.com> * Add comment explaining why Prefetch was not used for host latest summary Django Prefetch cannot handle latest per group -- [:1] slicing fetches 1 record globally, not per host (Django ticket #26780). The custom _fetch_all override uses the same 2-query pattern as prefetch_related internally, customized for this use case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix null handling to keep old behavior --------- Signed-off-by: Ben Thomasson <bthomass@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: AlanCoding <arominge@redhat.com>	2026-04-28 10:47:22 -04:00
Alan Rominger	f57a9863d6	Use advisory_lock from DAB (#15676 ) * Use advisory_lock from DAB * Remove the django-pglocks dep * Re-run updater script * Move the import in new location	2025-01-15 14:06:59 -05:00
Seth Foster	7a1ed406da	Remove CRUD for Receptor Addresses Removes ability to directly create and delete receptor addresses for a given node. Instead, receptor addresses are created automatically if listener_port is set on the Instance. For example patching "hop" instance with {"listener_port": 6667} will create a canonical receptor address with port 6667. Likewise, peers_from_control_nodes on the instance sets the peers_from_control_nodes on the canonical address (if listener port is also set). protocol is a read-only field that simply reflects the canonical address protocol. Other Changes: - rename k8s_routable to is_internal - add protocol to ReceptorAddress - remove peers_from_control_nodes and listener_port from Instance model Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Hao Liu	6db66c5f81	Fix provision instance not respecting protocol	2024-02-02 10:37:41 -05:00
Seth Foster	9ba70c151d	Add canonical receptor address Creates a non-deletable address that acts as the "main" address for this instance. All other addresses for that instance must be non-canonical. When listener_port on an instance is set, automatically create a canonical receptor address where: - address is hostname of instance - port is listener_port - canonical is True Additionally, protocol field is added to instance to denote the receptor listener protocol to use (ws, tcp). The receptor config listener information is derived from the listener_port and protocol information. Having a canonical address that mirrors the listener_port ensures that an address exists that matches the receptor config information. Other changes: - Add managed field to receptor address. If managed is True, no fields on on this address can be edited via the API. If canonical is True, only the address cannot be edited. - Add managed field to instance. If managed is True, users cannot set node_state to deprovisioning (i.e. cannot delete node) This change to our mechanism to prevent users from deleting the mesh ingress hop node. - Field is_internal is now renamed to k8s_routable - Add reverse_peers on instance which is a list of instance IDs that peer to this instance (via an address) Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Seth Foster	3a17c45b64	Register_peers support for receptor_addresses register_peers has inputs: source: source instance peers: list of instances the source should peer to InstanceLink "target" is now expected to be a ReceptorAddress For each peer, we can just use the first receptor address. If multiple receptor addresses exist, throw a command error. Currently this command is only used on VM-deployments, where there is only a single receptor address per instance, so this should work fine. Other changes: drop listener_port field from Instance. Listener port is now just "port" on ReceptorAddress Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Seth Foster	bca68bcdf1	Add install bundle support group_vars all.yaml changes: - peer entry has two fields, address and port - receptor_port is inferred from the first receptor_address entry that uses protocol tcp other changes: ActivityStream now records when receptor_addresses are peered to Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Seth Foster	c32f234ebb	Add peers_from_control_nodes to ReceptorAddress - write_receptor_config peers to ReceptorAddress entries that have peers_from_control_nodes enabled - peers_from_control_nodes and listener_port removed from Instance model - peers_from_control_nodes added to ReceptorAddress model - ReceptorAddress is now unique by address and protocol combination - Write receptor config task is dispatched upon ReceptorAddress creation or deletion, and when control node is first created - InstanceLinkSerializer adds a target_address field and has logic to grab the instance hostname associated with the peered ReceptorAddress Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Seth Foster	5cb3d3b078	Update receptor conf when address changes Add post save and post delete hooks to call write_receptor_config when a receptor address is added / removed. Add peers_from_control_nodes to provision_instance Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2024-02-02 10:37:41 -05:00
Seth Foster	127a0cff23	Set ip_address to empty string ip_address cannot be null, so set to empty instead of None Signed-off-by: Seth Foster <fosterbseth@gmail.com>	2023-10-05 22:53:16 -04:00
Seth Foster	441336301e	Ensure ip_address is empty string	2023-08-29 13:06:54 -04:00
Seth Foster	81e06dace2	Add listener_port to provision_instance API changes - cannot change peers or enable peers_from_control_nodes on VM deployments - allow setting ip_address - use ip_address over hostname in the generated group_vars/all.yml - Drop api/v2/peers endpoint DB changes - add ip_address unique constraint, but ignore "" entries Other changes - provision_instance should take listener_port option Tests - test that new controls doesn't disturb other peers relationships - test ip_address over hostname	2023-08-29 13:06:54 -04:00
Seth Foster	2a51f23b7d	Add functional API tests add tests for calling write_receptor_config add write_receptor_config test Do not set default listener_port on control node	2023-08-29 13:06:54 -04:00
Cesar Francisco San Nicolas Martinez	081206965c	Generate random UUID by default for added remote nodes (#14074 )	2023-06-06 12:36:28 +02:00
Martin Slemr	05f918e666	HostMetric compliance computation	2023-03-23 14:06:55 -04:00
Jeff Bradberry	8f6849fc22	Include listener_port in the defaults for Instance.objects.register (#13328 )	2022-12-19 14:16:05 -03:00
Alan Rominger	a64467c5a6	Shortcut Instance.objects.me when possible	2022-10-05 09:11:42 -04:00
Jeff Bradberry	81e68cb9bf	Update node and link registration to put them in the right state 'Installed' for the nodes, 'Established' for the links.	2022-09-23 09:46:11 -04:00
Alan Rominger	cb63d92bbf	Remove committed_capacity field, delete supporting code (#12086 ) * Remove committed_capacity field, delete supporting code * Track consumed capacity to solve the negatives problem * Use more verbose name for IG queryset	2022-04-22 13:41:32 -04:00
Alan Rominger	6c56f2b35b	Delete dead code from get_or_register, move, and test	2022-03-30 13:35:42 -04:00
Jeff Bradberry	ac6a82eee4	Merge pull request #11654 from jbradberry/django-3.2-upgrade Django 3.2 upgrade	2022-03-17 10:34:22 -04:00
Alex Corey	f52ef6e967	Fixes case sensitive host count	2022-03-09 15:36:05 -05:00
Jeff Bradberry	9b6fa55433	Deal with breaking tests for 3.1 - Django's PostgreSQL JSONField wraps values in a JsonAdapter, so deal with that when it happens. This goes away in Django 3.1. - Setting related *_id fields clears the actual relation field, so trying to fake objects for tests is a problem - Instance.objects.me() was inappropriately creating stub objects every time while running tests, but some of our tests now create real db objects. Ditch that logic and use a proper fixture where needed. - awxkit tox.ini was pinned at Python 3.8	2022-03-07 18:11:36 -05:00
Jeff Bradberry	b852baaa39	Fix up logger .warn() calls to use .warning() instead This is a usage that was deprecated in Python 3.0.	2022-03-07 18:11:36 -05:00
Elijah DeLee	604cbc1737	Consume control capacity (#11665 ) * Select control node before start task Consume capacity on control nodes for controlling tasks and consider remainging capacity on control nodes before selecting them. This depends on the requirement that control and hybrid nodes should all be in the instance group named 'controlplane'. Many tests do not satisfy that requirement. I'll update the tests in another commit. * update tests to use controlplane We don't start any tasks if we don't have a controlplane instance group Due to updates to fixtures, update tests to set node type and capacity explicitly so they get expected result. * Fixes for accounting of control capacity consumed Update method is used to account for currently consumed capacity for instance groups in the in-memory capacity tracking data structure we initialize in after_lock_init and then update via calculate_capacity_consumed (both in task_manager.py) Also update fit_task_to_instance to consider control impact on instances Trust that these functions do the right thing looking for a node with capacity, and cut out redundant check for the whole group's capacity per Alan's reccomendation. * Refactor now redundant code Deal with control type tasks before we loop over the preferred instance groups, which cuts out the need for some redundant logic. Also, fix a bug where I was missing assigning the execution node in one case! * set job explanation on tasks that need capacity move the job explanation for jobs that need capacity to a function so we can re-use it in the three places we need it. * project updates always run on the controlplane Instance group ordering makes no sense on project updates because they always need to run on the control plane. Also, since hybrid nodes should always run the control processes for the jobs running on them as execution nodes, account for this when looking for a execution node. * fix misleading message the variables and wording were both misleading, fix to be more accurate description in the two different cases where this log may be emitted. * use settings correctly use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded name cache the controlplane_ig object during the after lock init to avoid an uneccesary query eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use only AWX_CONTROL_NODE_TASK_IMPACT * add test for control capacity consumption add test to verify that when there are 2 jobs and only capacity for one that one will move into waiting and the other stays in pending * add test for hybrid node capacity consumption assert that the hybrid node is used for both control and execution and capacity is deducted correctly * add test for task.capacity_type = control Test that control type tasks have the right capacity consumed and get assigned to the right instance group Also fix lint in the tests * jobs_running not accurate for control nodes We can either NOT use "idle instances" for control nodes, or we need to update the jobs_running property on the Instance model to count jobs where the node is the controller_node. I didn't do that because it may be an expensive query, and it would be hard to make it match with jobs_running on the InstanceGroup which filters on tasks assigned to the instance group. This change chooses to stop considering "idle" control nodes an option, since we can't acurrately identify them. The way things are without any change, is we are continuing to over consume capacity on control nodes because this method sees all control nodes as "idle" at the beginning of the task manager run, and then only counts jobs started in that run in the in-memory tracking. So jobs which last over a number of task manager runs build up consuming capacity, which is accurately reported via Instance.consumed_capacity * Reduce default task impact for control nodes This is something we can experiment with as far as what users want at install time, but start with just 1 for now. * update capacity docs Describe usage of the new setting and the concept of control impact. Co-authored-by: Alan Rominger <arominge@redhat.com> Co-authored-by: Rebeccah <rhunter@redhat.com>	2022-02-14 10:13:22 -05:00
Jeff Bradberry	bb14a95076	Remove the Instance.objects.active_count() method Literally nothing uses it. The similar Host.objects.active_count() method seems to be what is actually important for licensing.	2022-01-14 16:21:41 -05:00
Jeff Bradberry	f340f491dc	Control the visibility and use of hop node Instances - the list, detail, and health check API views should not include them - the Instance-InstanceGroup association views should not allow them to be changed - the ping view excludes them - list_instances management command excludes them - Instance.set_capacity_value sets hop nodes to 0 capacity - TaskManager will exclude them from the nodes available for job execution - TaskManager.reap_jobs_from_orphaned_instances will consider hop nodes to be an orphaned instance - The apply_cluster_membership_policies task will not manipulate hop nodes - get_broadcast_hosts will ignore hop nodes - active_count also will ignore hop nodes	2021-12-17 14:30:28 -05:00
Elijah DeLee	e10030b73d	Allow setting default execution group pod spec This will allow us to control the default container group created via settings, meaning we could set this in the operator and the default container group would get created with it applied. We need this for https://github.com/ansible/awx-operator/issues/242 Deepmerge the default podspec and the override With out this, providing the `spec` for the podspec would override everything contained, which ends up including the container used, which is not desired Also, use the same deepmerge function def, as the code seems to be copypasted from the utils	2021-12-10 15:02:45 -05:00
Rebeccah	a9f4011a45	defensive code for getting instance added, also simplified nested if statements, rewrote some comments add a logger warning that the instance is being grabbed by the hostname and not the UUID	2021-09-21 16:54:11 -04:00
Rebeccah	55f2125a51	if the user provides a uuid and it exists, allow that to tie to the instance, which allows the user to update the instance based on the UUID (includeding updating the hostname) should they choose to do so.	2021-09-21 16:54:11 -04:00
Jim Ladd	1b50db26b6	Explicitly pass in UUID to get_or_register Co-authored-by: Alan Rominger <arominge@redhat.com>	2021-09-14 10:58:29 -07:00
Jim Ladd	f39834ad82	pass uuid to Instance.create	2021-09-03 10:05:15 -07:00
Jim Ladd	bdb13343bb	remove unused import	2021-09-03 10:05:15 -07:00
Jim Ladd	262cd3c695	set default uuid	2021-09-03 10:05:15 -07:00
Jim Ladd	f02099e8b7	provision_instance should create new uuid if needed .. instead of default to current system's UUID related: #10990	2021-09-03 10:05:15 -07:00
Alan Rominger	daf4310176	Clean up work_type processing and fix execution vs control capacity (#10930 ) * Clean up added work_type processing for mesh_code branch * track both execution and control capacity * Remove unused execution_capacity property * Count all forms of capacity to make test pass * Force jobs to be on execution nodes, updates on control nodes * Introduce capacity_type property to abstract some details out * Update test to cover all job types at same time * Register OpenShift nodes as control types * Remove unqualified consumed_capacity from task manager and make unit tests work * Remove unqualified consumed_capacity from task manager and make unit tests work * Update unit test to execution vs control TM logic changes * Fix bug, else handling for work_type method	2021-08-26 07:24:14 -04:00
Alan Rominger	928c35ede5	Model changes for instance last_seen field to replace modified (#10870 ) * Model changes for instance last_seen field to replace modified * Break up refresh_capacity into smaller units * Rename execution node methods, fix last_seen clustering * Use update_fields to make it clear save only affects capacity * Restructing to pass unit tests * Fix bug where a PATCH did not update capacity value	2021-08-24 08:41:35 -04:00
Alan Rominger	f47eb126e2	Adopt the node_type field in receptor logic (#10802 ) * Adopt the node_type field in receptor logic * Refactor Instance.objects.register so we do not reset capacity to 0	2021-08-24 08:41:34 -04:00
Rebeccah	fd6ce66906	edit the provision_instance awx-manage command to include node_type when provisioning a new instance.	2021-07-26 16:36:44 -04:00
Alan Rominger	8c1bc97c2f	Fix up unit tests after tower to controller rename	2021-06-22 10:49:36 -04:00
Shane McDonald	ec8ac6f1a7	Introduce distinct controlplane instance group	2021-06-07 11:25:59 -04:00
Yanis Guenane	82c4f6bb88	Define a DEFAULT_QUEUE_NAME	2021-06-07 11:25:23 -04:00
Ryan Petrello	200901e53b	upgrade to partitions without a costly bulk data migration keep pre-upgrade events in an old table (instead of a partition) - instead of creating a default partition, keep all events in special "unpartitioned" tables - track these tables via distinct proxy=true models - when generating the queryset for a UnifiedJob's events, look at the creation date of the job; if it's before the date of the migration, query on the old unpartitioned table, otherwise use the more modern table that provides auto-partitioning	2021-06-04 09:17:08 -07:00
Jeff Bradberry	6a599695db	Remove the IsolatedManager and its associated playbooks and plugins	2021-04-22 10:17:02 -04:00
Jeff Bradberry	b0cdfe7625	Clean up the management commands	2021-04-22 10:11:27 -04:00
Ryan Petrello	c2ef0a6500	move code linting to a stricter pep8-esque auto-formatting tool, black	2021-03-23 09:39:58 -04:00
Shane McDonald	1c4a376758	Explicit db field for is_container_group We now have Container Groups that dont require a credential.	2021-03-15 13:28:39 -04:00
Shane McDonald	373bb443aa	UnifiedJob#is_containerized -> UnifiedJob#is_container_group_task	2021-03-03 18:52:55 -05:00
Ryan Petrello	73baf3fcf9	defer loading Job.artifacts on host views to improve performance see: https://github.com/ansible/awx/issues/8006 this data can be really large, and we don't actually need it for the summary fields on this API endpoint	2020-08-26 13:34:15 -04:00

1 2 3

111 Commits