* Add no cov on fail flag to fix CI
* Bump django to 4.2.21
* Coverage args
* Try cov equals awx
* Run migration test in parallel
* Ignore error and clean up commands
* Try to make schema check not hang
---------
Co-authored-by: Satoe Imaishi <simaishi@redhat.com>
* Clean up logging when receptor not running
* Make logging more concise for unregistered case
* Silence another unwanted traceback
* Silence a few more tracebacks
* removing the requirement for re and changing to startswith which the other AAP collections use
* telling sonarqube to ignore this line
* fixing lint error
Fallback to basic auth if OAUTH to console.redhat.com fails
Notes:
Envoy has a timeout of 30 seconds, so
the total timeout should be less than that.
(5, 20) means 5 seconds to connect to server,
20 seconds to start reading data.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Fix bug where collectstatic could error due to dispatcherd config
* Revert test because it will not work in test suite
* New publish mocking system
* Remove import of unused
* Fix default publish broker
* Fast fix for old version nodejs
Fixing error
required: { node: '^18.0.0 || >=20.0.0' },
current: { node: 'v16.13.1', npm: '8.5.0' }
* Use node js 18 by default to align with official docs
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Use dynamic AWX max_workers value
Make basic --status and --running commands work
Make feature flag enabled true by default for development
* [dispatcherd] Dispatcher socket-based `--status` demo working (#15908)
* Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) (#15911)
* refactor(system): extract common heartbeat helpers and split cluster_node_heartbeat
Extract common heartbeat logic into helper functions: _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection. _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary. _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.
Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).
Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.
Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.
* feat(publish): delegate heartbeat task submission to new dispatcherd implementation
Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.
When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.
Preserve the original behavior for all other tasks and fallback on error.
* refactor(system): extract task ID retrieval from dispatcherd into helper function
Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.
* fix(dispatcher): Enable task decorator to work with and without feature flag
Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.
AAP-41775
* fix(dispatcher): Improve error handling and logging in feature flag implementation
- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions
This completes the robust implementation for switching between dispatcher
implementations based on feature flags.
AAP-41775
* fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility
Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.
The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging
This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.
AAP-41775
* refactor(dispatcher): Remove excessive logging from dispatcher implementation
Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting
Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging
AAP-41775
* fix(dispatcher): Fix shallow copy in dispatcher schedule conversion
This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.
Refs: AAP-41775
* Use IPC mechanism to get running tasks (#15926)
* Allow tasks from tasks
* Fix failure to limit to waiting jobs
* Get job record with lock
* Fix failures in dispatcherd feature branch (#15930)
* Fully handle DispatcherCancel
* Complete rest of preload import work
* Complete dispatcherd integration & job cancellation (AAP-43033) (#15941)
* feat(dispatcher): Implement job cancellation for new dispatcher
Adds feature-flag-aware job cancellation that routes cancel requests to either
the legacy dispatcher or the new dispatcherd library based on the
FEATURE_NEW_DISPATCHER flag.
- Updates cancel_dispatcher_process() to use dispatcherd's control API when enabled
- Handles both direct cancellation and task manager workflow cancellation cases
- Works with DispatcherCancel exception handling to properly handle SIGUSR1 signals
AAP-43033
* fix(dispatcher): Update run_dispatcher.py to properly handle task cancellation
Modifies the cancel command in run_dispatcher.py to properly cancel tasks
when the FEATURE_NEW_DISPATCHER flag is enabled, rather than just listing
running tasks.
The implementation translates each task UUID to the appropriate
filter format expected by the dispatcherd control API, maintaining the same
behavior as the original implementation.
Part of: AAP-43033
* refactor(system): Refactor dispatch_startup() to extract common startup logic and branch based on feature flag
This commit refactors the dispatch_startup() function to improve clarity and consistency across the legacy
and new dispatcherd flows.
No dispatcher-specific functionality is needed beyond the changes made, so this refactoring improves robustness without
altering core behavior.
* refactor(system): Refactor inform_cluster_of_shutdown() for clarity
* refactor(tasks): Replace @task with @task_awx across 22 tasks for dispatcher compatibility
- Migrated all task decorators to use @task_awx, ensuring dispatcher-aware behavior.
- Tested each task with the new dispatcherd, verifying that tasks using the registry pattern execute correctly without needing binder‐based alternative implementations.
- Removed redundant logging and outdated comments.
- Legacy tasks that do not require special parameter extraction continue to use their original logic.
- This commit reflects our complete journey of testing and verifying dispatcherd compatibility across all 22 tasks.
* refactor(publish): fix linter
* Fix bug from the branch rebase
* AAP-43763 Add tests for connection management in dispatcherd workers (#15949)
* Add test for job cancel in live tests
* Fix bug from the branch rebase
* Add test for connection recovery after connection broke
* Add test for breaking connection
* Fix dispatcherd bugs: schedule aliases, job kwargs handling, cancel handling (#15960)
* Put in job kwargs handling, not done before
* AAP-44382 [dispatcherd] Fixes for running with feature flag off (#15973)
* Use correct decorator for test of tasks
* Finalize dispatcherd feature branch (#15975)
* Work dispatcherd into dependency management system
* Use util methods from DAB
* Rename the dispatcherd feature flag, and flip default to not-enabled
* Move to new submit_task method
* Update the location of the sock file
* AAP-44381 Make dispatcherd config loading more lazy (#15979)
* Make dispatcherd config loading more lazy
* Make submission error more obvious
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd (#15983)
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd
* Minor adjustments to dispatcherd status command
* [dispatcherd] Get rid of alternative task registry (#15984)
Get rid of alternative task registry
* Fix deadlock error and other cleanup errors (#15987)
* Move to proper error handling location
---------
Co-authored-by: artem_tiupin <70763601+art-tapin@users.noreply.github.com>
* add license expiry metric
* Update metrics test with default value to the new license metrics
* Add the changes of the black-lint command
* Update awx/main/analytics/metrics.py
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
With the "recent" changes making the lookup plugin `awx.awx.schedule_rrule` and
`awx.awx.schedule_rruleset` returning a list instead of string (see #15625), the
returned list (which will *always* carry only 1 item) needs to be transformed
to a string either adding `| join` or `| first`. I found `first` to be more
fitting as the list will *always* return a list with 1 item.
Additionally, the documentation that references `awx.awx.schedule_rruleset`
in the `awx.awx.schedule` module was wrong, which is also fixed by this PR.
Signed-off-by: Steffen Scheib <sscheib@redhat.com>
Co-authored-by: Steffen Scheib <steffen@scheib.me>
Retire workers after a certain age, allowing them to finish their
current task if they are not idle.
This mitigates any issues like memory leaks in long running workers,
especially if systems stay busy for months at a time.
Introduce new optional setting WORKER_MAX_LIFETIME_SECONDS, defaulting to 4 hours.
* Make the JT name uniqueness enforced at the database level
* Forgot demo project fixture
* New approach, done by adding a new field
* Update for linters and failures
* Fix logical error in migration test
* Revert some test changes based on review comment
* Do not rename first template, add test
* Avoid name-too-long rename errors
* Insert migration into place
* Move existing files with git
* Bump migrations of existing
* Update migration test
* Awkward bump
* Fix migration file link
* update test reference again
* Revise start_fact_cache and finish_fact_cache to use JSON file with host list inside it
* Revise artifacts path to be relative to the job private_data_dir
* Update calls to start_fact_cache and finish_fact_cache to agree with new reference to artifacts_dir
* Prevents unnecessary updates to ansible_facts_modified, fixing timestamp-related test failures.
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
Ensure service account authentication is being used
when falling back to using SUBSCRIPTIONS_CLIENT_ID.
Additional change:
Subscription data can return two types of capacities:
Sockets and Nodes
For determining overall capacity
if capacity name is Nodes:
capacity quantity x subscription quantity
if capacity name is Sockets:
capacity quantity / 2 (minimum of 1) x subscription quantity
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Demo of sorting hosts live test
* Sort both bulk updates and add batch size to facts bulk update to resolve deadlock issue
* Update tests to expect batch_size to agree with changes
* Add utility method to bulk update and sort hosts and applied that to the appropriate locations
Remove unused imports
Add utility method for sorting bulk updates
Remove try except OperationalError for loop
Remove unused import of django.db.OperationalError
Remove batch size as it is now on the bulk update utility method as 100
Remove batch size here since it is specified in sortedbulkupdate
Add transaction.atomic to have entire transaction is run as a signle transaction before committing to the db
Revert change to bulk update as it's not needed here and just sort instead
Move bulk_sorted utility method into db.py and updated name to not be specific to Hosts
Revise to import bulk_update_sorted.. rather than calling it as an argument
Fix way I'm importing bulk_update_sorted.. Remove unneeded Host import and remove calls to bul_update as args
Rebise calls to bulk_update_sorted.. to include Host in the args
REmove raw_update_hosts method and replace with bulk_update_sorted_by_id in update_hosts
Remove update_hosts function and replace with bulk_update_sorted_by_id
Update live tests to use bulk_update_sorted_by_id
Fix the fields in bulk_update to agree with test
* Update functional tests to use bulk_update_sorted_by_id since update_hosts has been deleted
Replace update_hosts with bulk_update_sorted_by_id
Remove referenes to update_hosts
Update corresponding fact cachin tests to use bulk_update_sorted_by_id
Remove import of bulk_sorted_update
Add code comment to live test to silence Sonarqube hotspot
* Add comment NOSONAR to get rid of Sonarqube warning since this is just a test and it's not actually a security issue
Get test_finish_job_fact_cache_with_existing_data passing
Get test_finish_job_fact_cache_clear passing
Remove reference to raw_update and replace with new bulk update utility method
Add pytest.mark.django_db to appropriate tests
Corrent which model is called in bulk_update_sorted_by_id
Remove now unused Host import
Point to where bulk_update_sorted_by_id to where that is actually being used
Correct import of bulk_update_sorted_by_id
Revert changes in this file to avoid db calls issue
Remove @pytest.mark.django_db from unit tests
Remove commented out host sorting suggested fix
Fix failing tests test_pre_post_run_hook_facts_deleted_sliced & test_pre_post_run_hook_facts
Remove atomic transaction line, add return, and add docstring
* Fix failing test test_finish_job_fact_cache_clear & test_finish_job_fact_cache_with_existing_data
---------
Co-authored-by: Alan Rominger <arominge@redhat.com>
Update code to pull subscriptions from
console.redhat.com instead of
subscription.rhsm.redhat.com
Uses service account client ID and client secret
instead of username/password, which is being
deprecated in July 2025.
Additional changes:
- In awx.awx.subscriptions module, use new service
account params rather than old basic auth params
- Update awx.awx.license module to use subscription_id
instead of pool_id. This is due to using a different API,
which identifies unique subscriptions by subscriptionID
instead of pool ID.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* Django url validators support ipv6, our custom URLField
allow_plain_hostname feature was messing with the hostname before it
was passed to Django validators.
* This change dodges the allow_plan_hostname transformations if the
value looks like it's an ipv6 one.
* Dump running tasks when running out of capacity
* Use same logic for max_workers and capacity
* Address case where CPU capacity is the constraint
* Add a test for correspondence
* Fake redis to make tests work
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Solved undesired behaviour with fact_cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Solved bug with slices
Signed-off-by: onetti7 <davonebat@gmail.com>
* Remove unused imports
Remove now unused line of code which was commented out by the contributor
Revert "Remove now unused line of code which was commented out by the contributor"
This reverts commit f1a056a2356d56bc7256957a18503bd14dcfd8aa.
* Add back line that had been commented out as this line makes hosts specific to the particular slice when applicable
Revise private_data_dir fixture to see if it improves code coverage
Checked out awx/main/tests/unit/models/test_jobs.py in devel to see if it resolves git diff issue
* Fix formatting in awx/main/tests/unit/models/test_jobs.py
Rename for loop from host in hosts to hosts in hosts_cahced and remove unneeded continue
Revise finish_fact_cache to utilize inventory rather than hosts
Remove local var hosts that was assigned but unused
Revert change in start_fact_cache hosts_cached back to hosts
Revise the way we are handling hosts_cached and joining the file
Revert "Revise the way we are handling hosts_cached and joining the file"
This reverts commit e6e3d2f09c1b79a9bce3647a72e3dad97fe0aed8.
Reapply "Revise the way we are handling hosts_cached and joining the file"
This reverts commit a42b7ae69133fee24d3a5f1b456d9c343d111df9.
Revert some of my changes to get back to a better working state
Rename for loop to host in hosts_cached and remove unneeded continue
Remove jobs job.get_hosts_for_fact_cache() from post run hook, fix if statement after continue block, and revise how we are calling hosts in finish for loop
Add test_invalid_host_facts to test_jobs to increase code coverage
Update method signature to use hosts_cached and updated other references to hosts in finish_facts_cached
Rename hosts iterator to hosts_cached to agree with naming elsewhere
Revise test_invalid_host_facts to get more code coverage
Revise test_invalid_host_facts to increase codecov
Revise test_pre_post_run_hook_facts_deleted_sliced to ensure we are hitting the assertionerror for code cov
Revise mock_inventory.hosts. to hit assert failure
Add revision of hosts and facts to force failure to satisfy code cov
Fix failure in test_pre_post_run_hook_facts_deleted_sliced
Add back for loop to create failures and add assert to hit them
Remove hosts.iterator() from both start_fact_cache and finish_fact_cache
Remove unused import of Queryset to satisfy api-lint
Fix typo in docstring hasnot to has not
Move hosts_cached.append(host) to outer loop in start_fact_cache
Add class to help support cached hosts resolving host.name issue with hosts_cached
* Add live tests for ansible facts
Remove fixture needed for local work only maybe
Revert "Add class to help support cached hosts resolving host.name issue with hosts_cached"
This reverts commit 99d998cfb9960baafe887de80bd9b01af50513ec.
* Move hosts_cached.append(host) outside of try except
* Move hosts_cached.append(host) to the beginning of start_fact_cache
---------
Signed-off-by: onetti7 <davonebat@gmail.com>
Co-authored-by: onetti7 <davonebat@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
Bug Error reporting and handling in GH14575/GH12682
This targets a bug that tries to parse blank string as None for panelid
and dashboardid.
It also prints more errors reporting to the console to diagnose
reporting issues
Co-authored-by: Lila Yasin <lyasin@redhat.com>
* Prevent job pod from mounting serviceaccount token
* Add serializer validation for cg pod_spec_override
Prevent automountServiceAccountToken to be set to true and provide an error message when automountServiceAccountToken is being set to true