Remove SELECT FOR UPDATE from job dispatch to reduce transaction rollbacks
Move status transition from BaseTask.transition_status (which used
SELECT FOR UPDATE inside transaction.atomic()) into
dispatch_waiting_jobs. The new approach uses filter().update() which
is atomic at the database level without requiring explicit row locks,
reducing transaction contention and rollbacks observed in perfscale
testing.
The transition_status method was an artifact of the feature flag era
where we needed to support both old and new code paths. Since
dispatch_waiting_jobs is already a singleton
(on_duplicate='queue_one') scoped to the local node, the
de-duplication logic is unnecessary.
Status is updated after task submission to dispatcherd, so the job's
UUID is in the dispatch pipeline before being marked running —
preventing the reaper from incorrectly reaping jobs during the
handoff window. RunJob.run() handles the race where a worker picks
up the task before the status update lands by accepting waiting and
transitioning it to running itself.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add retrieve_workload_identity_jwt to jobs.py and tests
* Apply linting
* Add precondition to client retrieval
* Add test case for client not configured
* Remove trailing period in match string
* Introduced in PR https://github.com/ansible/awx/pull/16058/changes
then a later large merge from AAP back into devel removed the changes
* This PR re-introduces the github app lookup migration rename tests
with the migration names updated and the kind to namespace correction
* AAP-62657 Add populate_claims_for_workload function and unit tests
* Update safe_get helper function
* Trigger CI rebuild to pick up latest django-ansible-base
* Trigger CI after org visibility update
* Retrigger CI
* Rename workload to job, refine safe_get helper function
* Update test_jobs to use job fixture
* Retrigger CI
* Create fresh job, removed launched_by since this is read-only property
* Retrigger CI after runner issues
* Retrigger CI after runner issues
* Add unit tests for other workload types
* Update CLAIM_LAUNCHED_BY_USER_NAME and CLAIM_LAUNCHED_BY_USER_ID, with CLAIM_LAUNCHED_BY_NAME and CLAIM_LAUNCHED_BY_ID
* Generate claims with a more static schema
try to operate directly on object when possible
For cases where field is valid for the type, but null value
still add the field, so blank and null values appear
* Allow unified related items to be omittied
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
* Enable new fancy asyncio metrics for dispatcherd
Remove old dispatcher metrics and patch in new data from local whatever
Update test fixture to new dispatcherd version
* Update dispatcherd again
* Handle node filter in URL, and catch more errors
* Add test for metric filter
* Split module for dispatcherd metrics
The OpenAPI schema incorrectly showed all 12 credential type kinds as
valid for POST/PUT/PATCH operations, when only 'cloud' and 'net' are
allowed for custom credential types. This caused API clients and LLM
agents to receive HTTP 400 errors when attempting to create credential
types with invalid kind values.
Add postprocessing hook to filter CredentialTypeRequest and
PatchedCredentialTypeRequest schemas to only show 'cloud', 'net',
and null as valid enum values, matching the existing validation logic.
No API behavior changes - this is purely a documentation fix.
Co-authored-by: Claude <noreply@anthropic.com>
Strip leading and trailing whitespace from SSH keys in validate_ssh_private_key()
to handle common copy-paste scenarios where hidden newlines cause base64 decoding
failures.
Changes:
- Added data.strip() in validate_ssh_private_key() before calling validate_pem()
- Added test_ssh_key_with_whitespace() to verify keys with leading/trailing
newlines are properly sanitized and validated
This prevents the confusing "HTTP 500: Internal Server Error" and
"binascii.Error: Incorrect padding" errors when users paste SSH keys with
accidental whitespace.
Fixes#14219
Signed-off-by: Joey Washburn <joey@joeywashburn.com>
* Add dispatcherctl command
* Add tests for dispatcherctl command
* Exit early if sqlite3
* Switch to dispatcherd mgmt cmd
* Move unwanted command options to run_dispatcher
* Add test for new stuff
* Update the SOS report status command
* make docs always reference new command
* Consistently error if given config file
* Additional dispatcher removal simplifications and waiting repear updates
* Fix double call and logging message
* Implement bugbot comment, should reap running on lost instances
* Add test case for new pending behavior
* WIP First pass
* started removing feature flags and adjusting logic
* Add decorator
* moved to dispatcher decorator
* updated as many as I could find
* Keep callback receiver working
* remove any code that is not used by the call back receiver
* add back auto_max_workers
* added back get_auto_max_workers into common utils
* Remove control and hazmat (squash this not done)
* moved status out and deleted control as no longer needed
* removed unused imports
* adjusted test import to pull correct method
* fixed imports and addressed clusternode heartbeat test
* Update function comments
* Add back hazmat for config and remove baseworker
* added back hazmat per @alancoding feedback around config
* removed baseworker completely and refactored it into the callback
worker
* Fix dispatcher run call and remove dispatch setting
* remove dispatcher mock publish setting
* Adjust heartbeat arg and more formatting
* fixed the call to cluster_node_heartbeat missing binder
* Fix attribute error in server logs
Refactored code to use Python's built-in datetime.timezone and zoneinfo instead of pytz for timezone handling. This modernizes the codebase and removes the dependency on pytz, aligning with current best practices for timezone-aware datetime objects.
* update to Python 3.12
* remove use of utcnow
* switch to timezone.utc
datetime.UTC is an alias of datetime.timezone.utc. if we're doing the double import for datetime it's more straightforward to just import timezone as well and get it directly
* debug python env version issue
* change python version
* pin to SHA and remove debug portion
* Remove the dynamic filter on dispatcher startup
Configure the dynamic logging level only on startup
* Special case for log level on settings change
* Add unit test for new behavior
* Add test for initial config
* Mark test django DB
* Do necessary requirement bump
* Delete cache in live test fixture
* Add test to recreate the error
* Also begin to add detection for empty event
* Remove breakpoint
* fix: ignore events with missing event types
* run linter and apply changes
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* AAP-57817 Add Redis connection retry using redis-py 7.0+ built-in mechanism
* Refactor Redis client helpers to use settings and eliminate code duplication
* Create awx/main/utils/redis.py and move Redis client functions to avoid circular imports
* Fix subsystem_metrics to share Redis connection pool between
client and pipeline
* Cache Redis clients in RelayConsumer and RelayWebsocketStatsManager to avoid creating new connection pools on every call
* Add cap and base config
* Add Redis retry logic with exponential backoff to handle connection failures during long-running operations
* Add REDIS_BACKOFF_CAP and REDIS_BACKOFF_BASE settings to allow
adjustment of retry timing in worst-case scenarios without code changes
* Simplify Redis retry tests by removing unnecessary reload logic
* add force flag to refspec
* Development of git --amend test
* Update awx/main/tests/live/tests/conftest.py
Co-authored-by: Alan Rominger <arominge@redhat.com>
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
* Change Swagger UI endpoint from /api/swagger/ to /api/docs/
- Update URL pattern to use /docs/ instead of /swagger/
- Update API root response to show 'docs' key instead of 'swagger'
- Add authentication requirement for schema documentation endpoints
- Update contact email to controller-eng@redhat.com
The schema endpoints (/api/docs/, /api/schema/, /api/redoc/) now
require authentication to prevent unauthorized access to API
documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Require authentication for all schema endpoints including /api/schema/
Create custom view classes that enforce authentication for all schema
endpoints to prevent inconsistent access control where UI views required
authentication but the raw schema endpoint remained publicly accessible.
This ensures all schema endpoints (/api/schema/, /api/docs/, /api/redoc/)
consistently require authentication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add unit tests for authenticated schema view classes
Add test coverage for the new AuthenticatedSpectacular* view classes
to ensure they properly require authentication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove unused import
---------
Co-authored-by: Claude <noreply@anthropic.com>
* AAP-45927 Add drf-spectacular
- Remove drf-yasg
- Add drf-spectacular
* move SPECTACULAR_SETTINGS from development_defaults.py to defaults.py
* move SPECTACULAR_SETTINGS from development_defaults.py to defaults.py
* Fix swagger tests: enable schema endpoints in all modes
Schema endpoints were restricted to development mode, causing
test_swagger_generation.py to fail. Made schema URLs available in
all modes and fixed deprecated Django warning filters in pytest.ini.
* remove swagger from Makefile
* remove swagger from Makefile
* change docker-compose-build-swagger to docker-compose-build-schema
* remove MODE
* remove unused import
* Update genschema to use drf-spectacular with awx-link dependency
- Add awx-link as dependency for genschema targets to ensure package metadata exists
- Remove --validate --fail-on-warn flags (schema needs improvements first)
- Add genschema-yaml target for YAML output
- Add schema.yaml to .gitignore
* Fix detect-schema-change to not fail on schema differences
Add '-' prefix to diff command so Make ignores its exit status.
diff returns exit code 1 when files differ, which is expected behavior
for schema change detection, not an error.
* Truncate schema diff summary to stay under GitHub's 1MB limit
Limit schema diff output in job summary to first 1000 lines to avoid
exceeding GitHub's 1MB step summary size limit. Add message indicating
when diff is truncated and direct users to job logs or artifacts for
full output.
* readd MODE
* add drf-spectacular to requirements.in and the requirements.txt generated from the script
* Add drf-spectacular BSD license file
Required for test_python_licenses test to pass now that drf-spectacular
is in requirements.txt.
* add licenses
* Add comprehensive unit tests for CustomAutoSchema
Adds 15 unit tests for awx/api/schema.py to improve SonarCloud test
coverage. Tests cover all code paths in CustomAutoSchema including:
- get_tags() method with various scenarios (swagger_topic, serializer
Meta.model, view.model, exception handling, fallbacks, warnings)
- is_deprecated() method with different view configurations
- Edge cases and priority ordering
All tests passing.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove unused imports
---------
Co-authored-by: Claude <noreply@anthropic.com>
* prometheus-client returns an additional value as of v.0.22.0
* add license, remove outdated ones, add new embedded sources
* update requirements and UPGRADE BLOCKERs in README
settings.SUBSCRIPTIONS_USERNAME and
settings.SUBSCRIPTIONS_CLIENT_ID
should be mutually exclusive. This is because
the POST to api/v2/config/attach/ accepts only
a subscription_id, and infers which credentials to
use based on settings. If both are set, it is ambiguous
and can lead to unexpected 400s when attempting
to attach a license.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* We had race conditions with the system_administrator role being
created just-in-time. Instead of fixing the race condition(s), dodge
them by ensuring the role always exists
* Disconnect logic to fill in role parents
Get tests passing hopefully
Whatever SonarCloud
* remove role parents/children endpoints and related views
* remove duplicate get_queryset method from RoleTeamsList
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
Allow users to do subscription management using
Red Hat username and password.
In basic auth case, the candlepin API
at subscriptions.rhsm.redhat.com will be used instead
of console.redhat.com.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Added tests for cross org sharing of credentials
* added negative testing for sharing of credentials
* added conditions and tests for roleteamslist regarding cross org credentials
* removed redundant codes
* made error message more articulated and specific
Bump migrations and delete some files
Resolve remaining conflicts
Fix requirements
Flake8 fixes
Prefer devel changes for schema
Use correct versions
Remove sso connected stuff
Update to modern actions and collection fixes
Remove unwated alias
Version problems in actions
Fix more versioning problems
Update warning string
Messed it up again
Shorten exception
More removals
Remove pbr license
Remove tests deleted in devel
Remove unexpected files
Remove some content missed in the rebase
Use sleep_task from devel
Restore devel live conftest file
Add in settings that got missed
Prefer devel version of collection test
Finish repairing .github path
Remove unintended test file duplication
Undo more unintended file additions
* Allow creating galaxy credential types without an organization (#16077)
* remove requirement for galaxy credentials to belong to an organization
* remove organization check for galaxy credential type
* add functional test
* fix: do not create multiple mappers for lists of emails or usernames
* fix: create multiple matchers, don't rely on matches_or
* fix tests
* truncate mapper names to a max of 128 chars
* better naming scheme for matchers
* Add LDAP support to gateway_mapping and expand test coverage
- Add new process_ldap_user_list function for LDAP group processing
- Add auth_type parameter to org_map_to_gateway_format and team_map_to_gateway_format
- Support both 'sso' and 'ldap' authentication types in mapping functions
- Fix syntax error and logic bug in existing code
- Add comprehensive unit tests for process_ldap_user_list function (13 test cases)
- Add unit tests for auth_type parameter functionality
- Update helper functions to support new auth_type parameter
- All tests pass and maintain backward compatibility
Technical changes:
- process_ldap_user_list handles None, boolean, string, and list inputs
- Proper type hints with mypy compatibility
- LDAP groups use 'has_or' trigger format vs SSO attribute matching
- Boolean True/False create Always/Never Allow triggers for LDAP
- Maintains proper ordering and mapper structure
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
* Fix empty list bug in process_ldap_user_list and add comprehensive tests
- Fix process_ldap_user_list to return empty list for empty input instead of creating invalid trigger
- Empty list [] now correctly returns no triggers instead of trigger with empty has_or array
- Add test case for empty list behavior in both LDAP and SSO functions
- Update existing test_empty_list to expect correct behavior (0 triggers)
- Maintain backward compatibility for all other input types
- Comprehensive testing confirms no regression in existing functionality
Bug Details:
- Before: process_ldap_user_list([]) returned [{'name': 'Match User Groups', 'trigger': {'groups': {'has_or': []}}}]
- After: process_ldap_user_list([]) returns [] (correct behavior)
- SSO function already handled this correctly
This prevents potential Gateway issues with empty has_or arrays and ensures logical consistency.
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
* Add comprehensive LDAP migrator tests and fix category handling
- Add comprehensive unit test suite for LDAPMigrator class (26 tests)
- Test LDAP configuration scenarios including multiple instances, mappings, and edge cases
- Add tests for mixed boolean/group mappings, special characters in org names, and empty configs
- Fix LDAP authenticator category to always be 'ldap' (not 'ldap<suffix>')
- Add auth_type='ldap' parameter to org_map_to_gateway_format and team_map_to_gateway_format calls
- Include AAP-51531 reference comments for specific test cases
- All tests passing (26/26)
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>