* Update the workflow to allow the action to write our branches from it.
* Also added username and email as git by default will want to know who
is performing the action (edge case). Using github actions bot is
standard practice
* 🧪 Unpersist Git creds @ cov combine job
This is one of the things Zizmor [[1]] warns about.
[1]: https://docs.zizmor.sh
* 🧪 Download all coverage artifacts in one go
* 🧪 Delegate artifact garbage collection to GH
This is implemented by setting the retention days input to 1 on the
initial upload.
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <webknjaz@redhat.com>
* 🧪 Unpersist Git creds @ cov combine job
This is one of the things Zizmor [[1]] warns about.
[1]: https://docs.zizmor.sh
* 🧪 Download all coverage artifacts in one go
* 🧪 Delegate artifact garbage collection to GH
This is implemented by setting the retention days input to 1 on the
initial upload.
```
/var/lib/awx/venv/awx/lib64/python3.11/site-packages/_pytest/python.py:163:
PytestReturnNotNoneWarning: Expected None, but
awx/main/tests/unit/test_tasks.py::TestJobCredentials::test_custom_environment_injectors_with_boolean_extra_vars
returned ['successful', 0], which will be an error in a future version
of pytest. Did you mean to use `assert` instead of `return`?
```
* Dug into the git blame for this one
060585434abb5456935b7378211813b2ceaacaaa is the commit for any
historians. It was wrongfully carried over from a mock pexpect
implementation. Our new tests are nice. They don't go as far as trying
to run the task so they do not need to mock pexpect. That is why it is
safe to remove this code without finding it a new home.
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
* Make the JT name uniqueness enforced at the database level
* Forgot demo project fixture
* New approach, done by adding a new field
* Update for linters and failures
* Fix logical error in migration test
* Revert some test changes based on review comment
* Do not rename first template, add test
* Avoid name-too-long rename errors
* Insert migration into place
* Move existing files with git
* Bump migrations of existing
* Update migration test
* Awkward bump
* Fix migration file link
* update test reference again
When POSTing to console.redhat.com, fallback
to using basic auth method if OAUTH via
service accounts fails
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Add no cov on fail flag to fix CI
* Bump django to 4.2.21
* Coverage args
* Try cov equals awx
* Run migration test in parallel
* Ignore error and clean up commands
* Try to make schema check not hang
---------
Co-authored-by: Satoe Imaishi <simaishi@redhat.com>
* Clean up logging when receptor not running
* Make logging more concise for unregistered case
* Silence another unwanted traceback
* Silence a few more tracebacks
* removing the requirement for re and changing to startswith which the other AAP collections use
* telling sonarqube to ignore this line
* fixing lint error
Fallback to basic auth if OAUTH to console.redhat.com fails
Notes:
Envoy has a timeout of 30 seconds, so
the total timeout should be less than that.
(5, 20) means 5 seconds to connect to server,
20 seconds to start reading data.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Fix bug where collectstatic could error due to dispatcherd config
* Revert test because it will not work in test suite
* New publish mocking system
* Remove import of unused
* Fix default publish broker
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
* Fast fix for old version nodejs
Fixing error
required: { node: '^18.0.0 || >=20.0.0' },
current: { node: 'v16.13.1', npm: '8.5.0' }
* Use node js 18 by default to align with official docs
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Use dynamic AWX max_workers value
Make basic --status and --running commands work
Make feature flag enabled true by default for development
* [dispatcherd] Dispatcher socket-based `--status` demo working (#15908)
* Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) (#15911)
* refactor(system): extract common heartbeat helpers and split cluster_node_heartbeat
Extract common heartbeat logic into helper functions: _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection. _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary. _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.
Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).
Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.
Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.
* feat(publish): delegate heartbeat task submission to new dispatcherd implementation
Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.
When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.
Preserve the original behavior for all other tasks and fallback on error.
* refactor(system): extract task ID retrieval from dispatcherd into helper function
Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.
* fix(dispatcher): Enable task decorator to work with and without feature flag
Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.
AAP-41775
* fix(dispatcher): Improve error handling and logging in feature flag implementation
- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions
This completes the robust implementation for switching between dispatcher
implementations based on feature flags.
AAP-41775
* fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility
Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.
The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging
This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.
AAP-41775
* refactor(dispatcher): Remove excessive logging from dispatcher implementation
Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting
Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging
AAP-41775
* fix(dispatcher): Fix shallow copy in dispatcher schedule conversion
This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.
Refs: AAP-41775
* Use IPC mechanism to get running tasks (#15926)
* Allow tasks from tasks
* Fix failure to limit to waiting jobs
* Get job record with lock
* Fix failures in dispatcherd feature branch (#15930)
* Fully handle DispatcherCancel
* Complete rest of preload import work
* Complete dispatcherd integration & job cancellation (AAP-43033) (#15941)
* feat(dispatcher): Implement job cancellation for new dispatcher
Adds feature-flag-aware job cancellation that routes cancel requests to either
the legacy dispatcher or the new dispatcherd library based on the
FEATURE_NEW_DISPATCHER flag.
- Updates cancel_dispatcher_process() to use dispatcherd's control API when enabled
- Handles both direct cancellation and task manager workflow cancellation cases
- Works with DispatcherCancel exception handling to properly handle SIGUSR1 signals
AAP-43033
* fix(dispatcher): Update run_dispatcher.py to properly handle task cancellation
Modifies the cancel command in run_dispatcher.py to properly cancel tasks
when the FEATURE_NEW_DISPATCHER flag is enabled, rather than just listing
running tasks.
The implementation translates each task UUID to the appropriate
filter format expected by the dispatcherd control API, maintaining the same
behavior as the original implementation.
Part of: AAP-43033
* refactor(system): Refactor dispatch_startup() to extract common startup logic and branch based on feature flag
This commit refactors the dispatch_startup() function to improve clarity and consistency across the legacy
and new dispatcherd flows.
No dispatcher-specific functionality is needed beyond the changes made, so this refactoring improves robustness without
altering core behavior.
* refactor(system): Refactor inform_cluster_of_shutdown() for clarity
* refactor(tasks): Replace @task with @task_awx across 22 tasks for dispatcher compatibility
- Migrated all task decorators to use @task_awx, ensuring dispatcher-aware behavior.
- Tested each task with the new dispatcherd, verifying that tasks using the registry pattern execute correctly without needing binder‐based alternative implementations.
- Removed redundant logging and outdated comments.
- Legacy tasks that do not require special parameter extraction continue to use their original logic.
- This commit reflects our complete journey of testing and verifying dispatcherd compatibility across all 22 tasks.
* refactor(publish): fix linter
* Fix bug from the branch rebase
* AAP-43763 Add tests for connection management in dispatcherd workers (#15949)
* Add test for job cancel in live tests
* Fix bug from the branch rebase
* Add test for connection recovery after connection broke
* Add test for breaking connection
* Fix dispatcherd bugs: schedule aliases, job kwargs handling, cancel handling (#15960)
* Put in job kwargs handling, not done before
* AAP-44382 [dispatcherd] Fixes for running with feature flag off (#15973)
* Use correct decorator for test of tasks
* Finalize dispatcherd feature branch (#15975)
* Work dispatcherd into dependency management system
* Use util methods from DAB
* Rename the dispatcherd feature flag, and flip default to not-enabled
* Move to new submit_task method
* Update the location of the sock file
* AAP-44381 Make dispatcherd config loading more lazy (#15979)
* Make dispatcherd config loading more lazy
* Make submission error more obvious
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd (#15983)
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd
* Minor adjustments to dispatcherd status command
* [dispatcherd] Get rid of alternative task registry (#15984)
Get rid of alternative task registry
* Fix deadlock error and other cleanup errors (#15987)
* Move to proper error handling location
---------
Co-authored-by: artem_tiupin <70763601+art-tapin@users.noreply.github.com>
Retire workers after a certain age, allowing them to finish their
current task if they are not idle.
This mitigates any issues like memory leaks in long running workers,
especially if systems stay busy for months at a time.
Introduce new optional setting WORKER_MAX_LIFETIME_SECONDS, defaulting to 4 hours
* add license expiry metric
* Update metrics test with default value to the new license metrics
* Add the changes of the black-lint command
* Update awx/main/analytics/metrics.py
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
With the "recent" changes making the lookup plugin `awx.awx.schedule_rrule` and
`awx.awx.schedule_rruleset` returning a list instead of string (see #15625), the
returned list (which will *always* carry only 1 item) needs to be transformed
to a string either adding `| join` or `| first`. I found `first` to be more
fitting as the list will *always* return a list with 1 item.
Additionally, the documentation that references `awx.awx.schedule_rruleset`
in the `awx.awx.schedule` module was wrong, which is also fixed by this PR.
Signed-off-by: Steffen Scheib <sscheib@redhat.com>
Co-authored-by: Steffen Scheib <steffen@scheib.me>
Retire workers after a certain age, allowing them to finish their
current task if they are not idle.
This mitigates any issues like memory leaks in long running workers,
especially if systems stay busy for months at a time.
Introduce new optional setting WORKER_MAX_LIFETIME_SECONDS, defaulting to 4 hours.
* Make the JT name uniqueness enforced at the database level
* Forgot demo project fixture
* New approach, done by adding a new field
* Update for linters and failures
* Fix logical error in migration test
* Revert some test changes based on review comment
* Do not rename first template, add test
* Avoid name-too-long rename errors
* Insert migration into place
* Move existing files with git
* Bump migrations of existing
* Update migration test
* Awkward bump
* Fix migration file link
* update test reference again
* Revise start_fact_cache and finish_fact_cache to use JSON file with host list inside it
* Revise artifacts path to be relative to the job private_data_dir
* Update calls to start_fact_cache and finish_fact_cache to agree with new reference to artifacts_dir
* Prevents unnecessary updates to ansible_facts_modified, fixing timestamp-related test failures.
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
Ensure service account authentication is being used
when falling back to using SUBSCRIPTIONS_CLIENT_ID.
Additional change:
Subscription data can return two types of capacities:
Sockets and Nodes
For determining overall capacity
if capacity name is Nodes:
capacity quantity x subscription quantity
if capacity name is Sockets:
capacity quantity / 2 (minimum of 1) x subscription quantity
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Update subscription API to use service accounts
Update code to pull subscriptions from
console.redhat.com instead of
subscription.rhsm.redhat.com
Uses service account client ID and client secret
instead of username/password, which is being
deprecated in July 2025.
Additional changes:
- In awx.awx.subscriptions module, use new service
account params rather than old basic auth params
- Update awx.awx.license module to use subscription_id
instead of pool_id. This is due to using a different API,
which identifies unique subscriptions by subscriptionID
instead of pool ID.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* fix token name
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Fix Subscriptions credentials fallback
Ensure service account authentication is being used
when falling back to using SUBSCRIPTIONS_CLIENT_ID.
Additional change:
Subscription data can return two types of capacities:
Sockets and Nodes
For determining overall capacity
if capacity name is Nodes:
capacity quantity x subscription quantity
if capacity name is Sockets:
capacity quantity / 2 (minimum of 1) x subscription quantity
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>