* Add dispatcherctl command
* Add tests for dispatcherctl command
* Exit early if sqlite3
* Switch to dispatcherd mgmt cmd
* Move unwanted command options to run_dispatcher
* Add test for new stuff
* Update the SOS report status command
* make docs always reference new command
* Consistently error if given config file
* Additional dispatcher removal simplifications and waiting repear updates
* Fix double call and logging message
* Implement bugbot comment, should reap running on lost instances
* Add test case for new pending behavior
* WIP First pass
* started removing feature flags and adjusting logic
* Add decorator
* moved to dispatcher decorator
* updated as many as I could find
* Keep callback receiver working
* remove any code that is not used by the call back receiver
* add back auto_max_workers
* added back get_auto_max_workers into common utils
* Remove control and hazmat (squash this not done)
* moved status out and deleted control as no longer needed
* removed unused imports
* adjusted test import to pull correct method
* fixed imports and addressed clusternode heartbeat test
* Update function comments
* Add back hazmat for config and remove baseworker
* added back hazmat per @alancoding feedback around config
* removed baseworker completely and refactored it into the callback
worker
* Fix dispatcher run call and remove dispatch setting
* remove dispatcher mock publish setting
* Adjust heartbeat arg and more formatting
* fixed the call to cluster_node_heartbeat missing binder
* Fix attribute error in server logs
Refactored code to use Python's built-in datetime.timezone and zoneinfo instead of pytz for timezone handling. This modernizes the codebase and removes the dependency on pytz, aligning with current best practices for timezone-aware datetime objects.
Deleted the awx/main/management/commands/graph_jobs.py file and removed the asciichartpy package from requirements. This cleans up unused code and dependencies related to terminal job status graphing.
* update to Python 3.12
* remove use of utcnow
* switch to timezone.utc
datetime.UTC is an alias of datetime.timezone.utc. if we're doing the double import for datetime it's more straightforward to just import timezone as well and get it directly
* debug python env version issue
* change python version
* pin to SHA and remove debug portion
* Remove the dynamic filter on dispatcher startup
Configure the dynamic logging level only on startup
* Special case for log level on settings change
* Add unit test for new behavior
* Add test for initial config
* Mark test django DB
* Do necessary requirement bump
* Delete cache in live test fixture
Upgrade to Django 5.2 LTS with compatibility fixes across fields, migrations, dispatch config, tests, and dev deps.
Dependencies:
- Upgrade django to 5.2.8 and relax requirements.in to >=5.2,<5.3.
- Bump django-debug-toolbar to >=6.0 for compatibility.
Backend:
- awx/conf/fields.py: switch URL TLD regex to use DomainNameValidator.ul in custom URLField.
- awx/main/management/commands/gather_analytics.py: use datetime.timezone.utc for naïve datetime handling.
- awx/main/dispatch/config.py: add mock_publish option; avoid DB access for test runs, set default max_workers, and support a noop broker.
Migrations (SQLite/Postgres compatibility):
- Add awx/main/migrations/_sqlite_helper.py with db-aware AlterIndexTogether/RenameIndex wrappers; consume in 0144_event_partitions.py and 0184_django_indexes.py.
- Update 0187_hop_nodes.py to use CheckConstraint(condition=...).
- Add 0205_alter_instance_peers_alter_job_hosts_and_more.py adjusting through_fields/relations on instance.peers, job.hosts, and role.ancestors.
- _dab_rbac.py: iterate roles with chunk_size=1000 for migration performance.
Tests:
Include hcp_terraform in default credential types in test_credential.py.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
Bump migrations and delete some files
Resolve remaining conflicts
Fix requirements
Flake8 fixes
Prefer devel changes for schema
Use correct versions
Remove sso connected stuff
Update to modern actions and collection fixes
Remove unwated alias
Version problems in actions
Fix more versioning problems
Update warning string
Messed it up again
Shorten exception
More removals
Remove pbr license
Remove tests deleted in devel
Remove unexpected files
Remove some content missed in the rebase
Use sleep_task from devel
Restore devel live conftest file
Add in settings that got missed
Prefer devel version of collection test
Finish repairing .github path
Remove unintended test file duplication
Undo more unintended file additions
* Added better error handling and messaging when the service token authentication is broken. Allowed for GATEWAY_BASE_URL to override the service token's base url if it is set in the environment variables.
Co-Authored-By: Cursor (claude-4-sonnet)
* Removed GATEWAY_BASE_URL override for service token auth.
* migrate settings using the existing authenticator framework
* add method to get settings value to gateway client
* add transformer functions for settings
* Switched back to PUT for settings updates
* Started wiring in testing changes
* Added settings_* aggregation results. Added skip-github option. Added tests.
Assisted-by: Cursor
* Added --skip-all-authenticators command line argument. Added GoogleOAuth testing. Added tests for skipping all authenticators.
Assisted-by: Cursor
* wip: migrate other missing settings
* update login_redirect_override in google_oauth2
* impement login redirect for azuread
* implement login redirect for github
* implement login redirect for saml
* set LOGIN_REDIRECT_OVERRIDE even if no authenticator matched
* extract logic for login redirect override to base class
* use urlparse to compare valid redirect urls
* Preserve the original query parameters
* Fix flake8 issues
* Preserve the query parameter in sso_login_url
Gateway sets the sso_login_url to
/api/gateway/social/login/aap-saml-keycloak/?idp=IdP
The idp needs to be preserved when creating the redirect
* Update awx/main/utils/gateway_client.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* Update awx/main/management/commands/import_auth_config_to_gateway.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* list of settings updated
* Update awx/main/utils/gateway_client.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* Update awx/sso/utils/base_migrator.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* fix tests
---------
Co-authored-by: Andrew Potozniak <potozniak@redhat.com>
Co-authored-by: Madhu Kanoor <mkanoor@redhat.com>
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* feat: AAP-48498 Radius authenticator migrator
Issue: AAP-48498
* fix: Namingm Style and tests
* enabled by default
* test: SECRET is now ignored unless --force is set
* add force flag to enforce updates even when authenticator already exists
* remove cleartext field
* update list of encrypted fields
* show updated and unchanged authenticators in report
* collect controller ldap configuration
* translate role mapping and submit ldap authenticator
* implement require and deny group mapping
* remove all references of awx in the naming
* fix linter issues
* address PR feedback
* update ldap authenticator naming
* update github authenticator naming
* assume that server_uri is always a string
* update order of evaluation for require and deny groups
* cleanup and move ldap related functions into the ldap migrator
* add skip option for saml
* update saml authenticator to new slug format
* update azuread authenticator to new slug format
This PR migrates the SAML configuration from the Controller
to the Gateway, it intentionally skips setting the CALLBACK_URL
so that the Gateway can fill in the appropriate URL.
* compare authenticators and mappers before recreating them
* add unit tests
* fix linter errors
* refactor and improve: better implementation for get_authenticator_by_slug and removal of redundant code
* add submit_authenticator method to handle create vs. update in a generic way
* remove unused import
* wip: management command for authenticator export to GateWay
* wip: implement ldap auth config migration
* refactor: split concerns into gathering config and converting / recreating config
* refactor: dry run by default
* use the authenticator slug for idempotency
* move to correct utils path
* use env vars instead of flags, fix linter errors
* remove unused import
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
Use dynamic AWX max_workers value
Make basic --status and --running commands work
Make feature flag enabled true by default for development
* [dispatcherd] Dispatcher socket-based `--status` demo working (#15908)
* Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) (#15911)
* refactor(system): extract common heartbeat helpers and split cluster_node_heartbeat
Extract common heartbeat logic into helper functions: _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection. _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary. _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.
Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).
Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.
Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.
* feat(publish): delegate heartbeat task submission to new dispatcherd implementation
Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.
When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.
Preserve the original behavior for all other tasks and fallback on error.
* refactor(system): extract task ID retrieval from dispatcherd into helper function
Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.
* fix(dispatcher): Enable task decorator to work with and without feature flag
Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.
AAP-41775
* fix(dispatcher): Improve error handling and logging in feature flag implementation
- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions
This completes the robust implementation for switching between dispatcher
implementations based on feature flags.
AAP-41775
* fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility
Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.
The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging
This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.
AAP-41775
* refactor(dispatcher): Remove excessive logging from dispatcher implementation
Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting
Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging
AAP-41775
* fix(dispatcher): Fix shallow copy in dispatcher schedule conversion
This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.
Refs: AAP-41775
* Use IPC mechanism to get running tasks (#15926)
* Allow tasks from tasks
* Fix failure to limit to waiting jobs
* Get job record with lock
* Fix failures in dispatcherd feature branch (#15930)
* Fully handle DispatcherCancel
* Complete rest of preload import work
* Complete dispatcherd integration & job cancellation (AAP-43033) (#15941)
* feat(dispatcher): Implement job cancellation for new dispatcher
Adds feature-flag-aware job cancellation that routes cancel requests to either
the legacy dispatcher or the new dispatcherd library based on the
FEATURE_NEW_DISPATCHER flag.
- Updates cancel_dispatcher_process() to use dispatcherd's control API when enabled
- Handles both direct cancellation and task manager workflow cancellation cases
- Works with DispatcherCancel exception handling to properly handle SIGUSR1 signals
AAP-43033
* fix(dispatcher): Update run_dispatcher.py to properly handle task cancellation
Modifies the cancel command in run_dispatcher.py to properly cancel tasks
when the FEATURE_NEW_DISPATCHER flag is enabled, rather than just listing
running tasks.
The implementation translates each task UUID to the appropriate
filter format expected by the dispatcherd control API, maintaining the same
behavior as the original implementation.
Part of: AAP-43033
* refactor(system): Refactor dispatch_startup() to extract common startup logic and branch based on feature flag
This commit refactors the dispatch_startup() function to improve clarity and consistency across the legacy
and new dispatcherd flows.
No dispatcher-specific functionality is needed beyond the changes made, so this refactoring improves robustness without
altering core behavior.
* refactor(system): Refactor inform_cluster_of_shutdown() for clarity
* refactor(tasks): Replace @task with @task_awx across 22 tasks for dispatcher compatibility
- Migrated all task decorators to use @task_awx, ensuring dispatcher-aware behavior.
- Tested each task with the new dispatcherd, verifying that tasks using the registry pattern execute correctly without needing binder‐based alternative implementations.
- Removed redundant logging and outdated comments.
- Legacy tasks that do not require special parameter extraction continue to use their original logic.
- This commit reflects our complete journey of testing and verifying dispatcherd compatibility across all 22 tasks.
* refactor(publish): fix linter
* Fix bug from the branch rebase
* AAP-43763 Add tests for connection management in dispatcherd workers (#15949)
* Add test for job cancel in live tests
* Fix bug from the branch rebase
* Add test for connection recovery after connection broke
* Add test for breaking connection
* Fix dispatcherd bugs: schedule aliases, job kwargs handling, cancel handling (#15960)
* Put in job kwargs handling, not done before
* AAP-44382 [dispatcherd] Fixes for running with feature flag off (#15973)
* Use correct decorator for test of tasks
* Finalize dispatcherd feature branch (#15975)
* Work dispatcherd into dependency management system
* Use util methods from DAB
* Rename the dispatcherd feature flag, and flip default to not-enabled
* Move to new submit_task method
* Update the location of the sock file
* AAP-44381 Make dispatcherd config loading more lazy (#15979)
* Make dispatcherd config loading more lazy
* Make submission error more obvious
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd (#15983)
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd
* Minor adjustments to dispatcherd status command
* [dispatcherd] Get rid of alternative task registry (#15984)
Get rid of alternative task registry
* Fix deadlock error and other cleanup errors (#15987)
* Move to proper error handling location
---------
Co-authored-by: artem_tiupin <70763601+art-tapin@users.noreply.github.com>
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
* Remove oauth provider
This removes the oauth provider functionality from awx. The
oauth2_provider app and all references to it have been removed.
Migrations to delete the two tables that locally overwrote
oauth2_provider tables are included. This change does not include
migrations to delete the tables provided by the oauth2_provider app.
Also not included here are changes to awxkit, awx_collection or the ui.
* Fix linters
* Update migrations after rebase
* Update collection tests for auth changes
The changes in https://github.com/ansible/awx/pull/15554 will cause a
few collection tests to fail, depending on what the test configuration
is. This changes the tests to look for a specific warning rather than
counting the number of warnings emitted.
* Update migration
* Removed unused oauth_scopes references
---------
Co-authored-by: Mike Graves <mgraves@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* Register all discovered CredentialType(s) after Django finishes
loading
* Protect parallel registrations using shared postgres advisory lock
* The down-side of this is that this will run when it does not need to,
adding overhead to the init process.
* Only register discovered credential types in the database IF
migrations have ran and are up-to-date.
We have not identify the root cause of wsrelay failure but attempt to make wsrelay restart itself resulted in postgres and redis connection leak. We were not able to fully identify where the redis connection leak comes from so reverting back to failing and removing startsecs 30 will prevent wsrelay to FATAL
- when re-establishing connection to db close old connection
- re-initialize WebSocketRelayManager when restarting asyncio.run
- log and ignore error in cleanup_offline_host (this might come back to bite us)
- cleanup connection when WebSocketRelayManager crash
* Add dump_auth_config management cmd
- Dump SAML config from AWX to DAB authenticator config in json format
* Add dumping of LDAP settings
* add test for command
* Fix is_enabled
* fix command name typo
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* add fields to config, add name to data
* break out IDP values
* change test fields and value comparison
* edit help text, reformat settings
---------
Co-authored-by: jessicamack <jmack@redhat.com>
* Organize metrics into their respective service
* Server per-service metrics on a per-service http server
* Increase prometheus client usage over our custom metrics fields
Adds remove_receptor_address to delete a
receptor address from the database
Also, enforce that only 1 canonical address
can be added to an instance via
the add_receptor_address command.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>