* Enable new fancy asyncio metrics for dispatcherd
Remove old dispatcher metrics and patch in new data from local whatever
Update test fixture to new dispatcherd version
* Update dispatcherd again
* Handle node filter in URL, and catch more errors
* Add test for metric filter
* Split module for dispatcherd metrics
The OpenAPI schema incorrectly showed all 12 credential type kinds as
valid for POST/PUT/PATCH operations, when only 'cloud' and 'net' are
allowed for custom credential types. This caused API clients and LLM
agents to receive HTTP 400 errors when attempting to create credential
types with invalid kind values.
Add postprocessing hook to filter CredentialTypeRequest and
PatchedCredentialTypeRequest schemas to only show 'cloud', 'net',
and null as valid enum values, matching the existing validation logic.
No API behavior changes - this is purely a documentation fix.
Co-authored-by: Claude <noreply@anthropic.com>
Changed two instances of 'cancelled' to 'canceled' in awx/main/wsrelay.py
to match AWX's standardized American English spelling convention.
- Updated log message in WebsocketRelayConnection.connect()
- Updated comment in WebSocketRelayManager.cleanup_offline_host()
Fixes#15177
Signed-off-by: Joey Washburn <joey@joeywashburn.com>
Strip leading and trailing whitespace from SSH keys in validate_ssh_private_key()
to handle common copy-paste scenarios where hidden newlines cause base64 decoding
failures.
Changes:
- Added data.strip() in validate_ssh_private_key() before calling validate_pem()
- Added test_ssh_key_with_whitespace() to verify keys with leading/trailing
newlines are properly sanitized and validated
This prevents the confusing "HTTP 500: Internal Server Error" and
"binascii.Error: Incorrect padding" errors when users paste SSH keys with
accidental whitespace.
Fixes#14219
Signed-off-by: Joey Washburn <joey@joeywashburn.com>
* Add dispatcherctl command
* Add tests for dispatcherctl command
* Exit early if sqlite3
* Switch to dispatcherd mgmt cmd
* Move unwanted command options to run_dispatcher
* Add test for new stuff
* Update the SOS report status command
* make docs always reference new command
* Consistently error if given config file
This setting is set in defaults.py, but
currently not being used. More technically,
project_update.yml is not passing this value to
the insights.py action plugin. Therefore, we
can safely remove references to it.
insights.py already has a default oidc endpoint
defined for authentication.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Additional dispatcher removal simplifications and waiting repear updates
* Fix double call and logging message
* Implement bugbot comment, should reap running on lost instances
* Add test case for new pending behavior
* Added link and ref to openAPI spec for community
* Update docs/docsite/rst/contributor/openapi_link.rst
Co-authored-by: Don Naro <dnaro@redhat.com>
* add sphinxcontrib-redoc to requirements
* sphinxcontrib.redoc configuration
* create openapi directory and files
* update download script for both schema files
* suppress warning for redoc
* update labels
* fix extra closing parenthesis
* update schema url
* exclude doc config and download script
The Sphinx configuration (conf.py) and schema download script
(download-json.py) are not application logic and used only for building
documentation. Coverage requirements for these files are overkill.
* exclude only the sphinx config file
---------
Co-authored-by: Don Naro <dnaro@redhat.com>
* WIP First pass
* started removing feature flags and adjusting logic
* Add decorator
* moved to dispatcher decorator
* updated as many as I could find
* Keep callback receiver working
* remove any code that is not used by the call back receiver
* add back auto_max_workers
* added back get_auto_max_workers into common utils
* Remove control and hazmat (squash this not done)
* moved status out and deleted control as no longer needed
* removed unused imports
* adjusted test import to pull correct method
* fixed imports and addressed clusternode heartbeat test
* Update function comments
* Add back hazmat for config and remove baseworker
* added back hazmat per @alancoding feedback around config
* removed baseworker completely and refactored it into the callback
worker
* Fix dispatcher run call and remove dispatch setting
* remove dispatcher mock publish setting
* Adjust heartbeat arg and more formatting
* fixed the call to cluster_node_heartbeat missing binder
* Fix attribute error in server logs
* Enhance OpenAPI schema with AI descriptions and fix method names
Add x-ai-description extensions to API endpoints for better AI agent
comprehension. Fix view method names to
ensure proper drf-spectacular schema generation.
* Enhance OpenAPI schema with AI descriptions and fix method names
Add x-ai-description extensions to API endpoints for better AI agent
comprehension. Fix view method names to
ensure proper drf-spectacular schema generation.
Remove transitive dependencies no longer needed by kubernetes 35.0.0
Removes google-auth and rsa which were transitive dependencies of the older
kubernetes client but are no longer required in v35.0.0.
Adds cachetools as a direct dependency since it's used by awx/conf/settings.py
for TTLCache (was previously a transitive dep of google-auth).
- Move kubernetes from git-based install to PyPI (v35.0.0 now available)
- Remove urllib3 cap comment since kubernetes 35.0.0 no longer restricts it
- Update README.md upgrade blocker documentation
* docs: update readthedocs.io URLs to docs.ansible.com equivalents
🤖 Generated with Claude Code
https://claude.ai/code
Co-Authored-By: Claude <noreply@anthropic.com>
* Update Bullhorn newsletter link in communication docs
---------
Co-authored-by: Claude <noreply@anthropic.com>
Refactored code to use Python's built-in datetime.timezone and zoneinfo instead of pytz for timezone handling. This modernizes the codebase and removes the dependency on pytz, aligning with current best practices for timezone-aware datetime objects.
Introduces new Makefile targets to update and upgrade requirements files using pip-compile, both directly and via docker-runner. These additions streamline dependency management for development and CI workflows.
Switch to git-based installation of kubernetes python client from
github.com/kubernetes-client/python at commit df31d90d6c910d6b5c883b98011c93421cac067d
(release-34.0 branch). This also allows removing the urllib3<2.4.0 upper bound
constraint that was previously required by kubernetes 34.1.0 from PyPI.
Use dnf module for Node.js 18 instead of n version manager
The n version manager fails to extract Node.js archives due to very long
file paths in include/node/openssl/archs/ directories when running in
Docker BuildKit's overlay filesystem. This causes CI build failures with
tar "Cannot open: Invalid argument" errors.
Switch to installing Node.js 18 directly from CentOS Stream 9's module
stream which avoids the archive extraction issue entirely.
* Fix ARM64 build failure by upgrading dev container Node.js to 18
Node.js 16.13.1 fails to extract on ARM64 in Docker BuildKit's
overlay filesystem during multi-arch builds. Upgrade to Node 18
which is already used by the UI builder stage and has proper
ARM64 support.
* Fix collectstatic failure by setting AWX_MODE=default
AWX_MODE=defaults is an intentionally "invalid" environment name that:
1. Loads only defaults.py - the base settings file without any environment-specific overrides (development_defaults.py, production_defaults.py, etc.)
2. Bypasses production checks - since "production" not in "defaults", it skips the assertion that requires /etc/tower/settings.py to exist
3. Bypasses development mode - since is_development_mode would be false
This is perfect for collectstatic during container build because:
- No database connection needed
- No secret key needed (hence SKIP_SECRET_KEY_CHECK)
- No PostgreSQL version check (hence SKIP_PG_VERSION_CHECK)
- Just need minimal Django settings to collect static files
* Fix pip version constraint for Python 3.12 compatibility
Remove outdated pip<22.0 constraint that was a workaround for
pip-tools#1558. This issue was fixed in pip-tools 6.5.0+ and
the old constraint breaks Python 3.12 where pkgutil.ImpImporter
was removed.
* Update requirements.txt
* Fix license file inconsistencies with requirements
- Rename awx-plugins.interfaces.txt to awx-plugins-interfaces.txt
to match the package name in requirements
- Remove backports-tarfile.txt and importlib-resources.txt as these
packages are no longer in requirements
* Fix updater.sh for pip 25.3 normalized output format
Changes to requirements_git.txt:
- Update to PEP 440 format (name @ git+url) to match pip-compile output
- Normalize package names (hyphens instead of dots/underscores)
- Sort extras alphabetically with hyphens (e.g., jwt-consumer not jwt_consumer)
- Add documentation explaining format requirements
Changes to updater.sh:
- Escape BRE regex metacharacters in sed pattern to handle brackets in extras
- Change sed delimiter from ! to | to avoid conflict with comment text
- Add explicit return statements to functions
- Assign positional parameters to local variables
- Redirect error messages to stderr
- Replace backticks with $() for command substitution
- Pin pip to version 25.3
requirements.txt regenerated via updater.sh
* Normalize package names in requirements.in to match pip output
- prometheus_client -> prometheus-client
- setuptools_scm -> setuptools-scm
- dispatcherd[pg_notify] -> dispatcherd[pg-notify]
PEP 503 specifies that package names should use hyphens.
* Fix license files to match normalized package names
- Remove awx_plugins.interfaces.txt (duplicate of awx-plugins-interfaces.txt)
- Rename system-certifi.txt to certifi.txt to match package name
Deleted the awx/main/management/commands/graph_jobs.py file and removed the asciichartpy package from requirements. This cleans up unused code and dependencies related to terminal job status graphing.
* update to Python 3.12
* remove use of utcnow
* switch to timezone.utc
datetime.UTC is an alias of datetime.timezone.utc. if we're doing the double import for datetime it's more straightforward to just import timezone as well and get it directly
* debug python env version issue
* change python version
* pin to SHA and remove debug portion
* Remove the dynamic filter on dispatcher startup
Configure the dynamic logging level only on startup
* Special case for log level on settings change
* Add unit test for new behavior
* Add test for initial config
* Mark test django DB
* Do necessary requirement bump
* Delete cache in live test fixture
* Add test to recreate the error
* Also begin to add detection for empty event
* Remove breakpoint
* fix: ignore events with missing event types
* run linter and apply changes
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
docker buildx build fails with
"Error: Unable to find a match: rsyslog-8.2102.0-106.el9"
unpinning builds successfully for both arm64 and x86_64
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Upgrade to Django 5.2 LTS with compatibility fixes across fields, migrations, dispatch config, tests, and dev deps.
Dependencies:
- Upgrade django to 5.2.8 and relax requirements.in to >=5.2,<5.3.
- Bump django-debug-toolbar to >=6.0 for compatibility.
Backend:
- awx/conf/fields.py: switch URL TLD regex to use DomainNameValidator.ul in custom URLField.
- awx/main/management/commands/gather_analytics.py: use datetime.timezone.utc for naïve datetime handling.
- awx/main/dispatch/config.py: add mock_publish option; avoid DB access for test runs, set default max_workers, and support a noop broker.
Migrations (SQLite/Postgres compatibility):
- Add awx/main/migrations/_sqlite_helper.py with db-aware AlterIndexTogether/RenameIndex wrappers; consume in 0144_event_partitions.py and 0184_django_indexes.py.
- Update 0187_hop_nodes.py to use CheckConstraint(condition=...).
- Add 0205_alter_instance_peers_alter_job_hosts_and_more.py adjusting through_fields/relations on instance.peers, job.hosts, and role.ancestors.
- _dab_rbac.py: iterate roles with chunk_size=1000 for migration performance.
Tests:
Include hcp_terraform in default credential types in test_credential.py.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
Adding ansible_base.api_documentation
to the INSTALL_APPS which extends the schema
to include an LLM-friendly description
to each endpoint
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* AAP-57817 Add Redis connection retry using redis-py 7.0+ built-in mechanism
* Refactor Redis client helpers to use settings and eliminate code duplication
* Create awx/main/utils/redis.py and move Redis client functions to avoid circular imports
* Fix subsystem_metrics to share Redis connection pool between
client and pipeline
* Cache Redis clients in RelayConsumer and RelayWebsocketStatsManager to avoid creating new connection pools on every call
* Add cap and base config
* Add Redis retry logic with exponential backoff to handle connection failures during long-running operations
* Add REDIS_BACKOFF_CAP and REDIS_BACKOFF_BASE settings to allow
adjustment of retry timing in worst-case scenarios without code changes
* Simplify Redis retry tests by removing unnecessary reload logic
Update schema upload workflows to organize S3 files by product name:
- Upload schemas to s3://awx-public-ci-files/{product}/{branch}/schema.json
- Update Makefile to download from product-specific paths for schema diff
- Update feature branch deletion to clean up from correct product path
This separates AWX and Tower schemas into distinct S3 folders.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
* add force flag to refspec
* Development of git --amend test
* Update awx/main/tests/live/tests/conftest.py
Co-authored-by: Alan Rominger <arominge@redhat.com>
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
Modify the invocation of @task_awx to accept timeout and
on_duplicate keyword arguments. These arguments are
only used in the new dispatcher implementation.
Add decorator params:
- timeout
- on_duplicate
to tasks to ensure better recovery for
stuck or long-running processes.
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Change Swagger UI endpoint from /api/swagger/ to /api/docs/
- Update URL pattern to use /docs/ instead of /swagger/
- Update API root response to show 'docs' key instead of 'swagger'
- Add authentication requirement for schema documentation endpoints
- Update contact email to controller-eng@redhat.com
The schema endpoints (/api/docs/, /api/schema/, /api/redoc/) now
require authentication to prevent unauthorized access to API
documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Require authentication for all schema endpoints including /api/schema/
Create custom view classes that enforce authentication for all schema
endpoints to prevent inconsistent access control where UI views required
authentication but the raw schema endpoint remained publicly accessible.
This ensures all schema endpoints (/api/schema/, /api/docs/, /api/redoc/)
consistently require authentication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add unit tests for authenticated schema view classes
Add test coverage for the new AuthenticatedSpectacular* view classes
to ensure they properly require authentication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove unused import
---------
Co-authored-by: Claude <noreply@anthropic.com>
* AAP-45927 Add drf-spectacular
- Remove drf-yasg
- Add drf-spectacular
* move SPECTACULAR_SETTINGS from development_defaults.py to defaults.py
* move SPECTACULAR_SETTINGS from development_defaults.py to defaults.py
* Fix swagger tests: enable schema endpoints in all modes
Schema endpoints were restricted to development mode, causing
test_swagger_generation.py to fail. Made schema URLs available in
all modes and fixed deprecated Django warning filters in pytest.ini.
* remove swagger from Makefile
* remove swagger from Makefile
* change docker-compose-build-swagger to docker-compose-build-schema
* remove MODE
* remove unused import
* Update genschema to use drf-spectacular with awx-link dependency
- Add awx-link as dependency for genschema targets to ensure package metadata exists
- Remove --validate --fail-on-warn flags (schema needs improvements first)
- Add genschema-yaml target for YAML output
- Add schema.yaml to .gitignore
* Fix detect-schema-change to not fail on schema differences
Add '-' prefix to diff command so Make ignores its exit status.
diff returns exit code 1 when files differ, which is expected behavior
for schema change detection, not an error.
* Truncate schema diff summary to stay under GitHub's 1MB limit
Limit schema diff output in job summary to first 1000 lines to avoid
exceeding GitHub's 1MB step summary size limit. Add message indicating
when diff is truncated and direct users to job logs or artifacts for
full output.
* readd MODE
* add drf-spectacular to requirements.in and the requirements.txt generated from the script
* Add drf-spectacular BSD license file
Required for test_python_licenses test to pass now that drf-spectacular
is in requirements.txt.
* add licenses
* Add comprehensive unit tests for CustomAutoSchema
Adds 15 unit tests for awx/api/schema.py to improve SonarCloud test
coverage. Tests cover all code paths in CustomAutoSchema including:
- get_tags() method with various scenarios (swagger_topic, serializer
Meta.model, view.model, exception handling, fallbacks, warnings)
- is_deprecated() method with different view configurations
- Edge cases and priority ordering
All tests passing.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove unused imports
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Fix CI: Use Python 3.13 for ansible-test compatibility
ansible-test only supports Python 3.11, 3.12, and 3.13.
Changed collection-integration jobs from '3.x' to '3.13'
to avoid using Python 3.14 which is not supported.
* Fix ansible-test Python version for CI integration tests
ansible-test only supports Python 3.11, 3.12, and 3.13.
Added ANSIBLE_TEST_PYTHON_VERSION variable to explicitly pass
--python 3.13 flag to ansible-test integration command.
This prevents ansible-test from auto-detecting and using
Python 3.14.0, which is not supported.
* Fix CI: Execute ansible-test with Python 3.13 to avoid
unsupported Python 3.14
* Fix CI: Use Python 3.13 across all jobs to avoid Python
3.14 compatibility issues
* Fix CI: Use 'python' and 'ansible.test' module for Python
3.13 compatibility
* Fix CI: Use 'python' instead of 'python3' for Python 3.13
compatibility
* Fix CI: Ensure ansible-test uses Python 3.13 environment
explicitly
* Fix: Remove silent failure check for ansible-core in test suite
* Fix CI: Export PYTHONPATH to make awxkit available to ansible-test
* Fix CI: Use 'python' in run_awx_devel to maintain Python
3.13 environment
* Fix CI: Remove setup-python from awx_devel_image that was resetting Python 3.13 to 3.14
* actually upload PR coverage reports and inject PR number if report is
generated from a PR
* upload general report of devel on merge and make things kinda pretty
* add new file to separate out the schema check so that it is no longer
part of CI check and won't cacuse the whole workflow to fail
* remove old API schema check from ci.yml
Moved the AddField operation before the RunPython operations for 'rename_jts' and 'rename_projects' in migration 0200_template_name_constraint.py. This ensures the new 'org_unique' field exists before related data migrations are executed.
Fix
```
django.db.utils.ProgrammingError: column main_unifiedjobtemplate.org_unique does not exist
```
while applying migration 0200_template_name_constraint.py
when there's a job template or poject with duplicate name in the same org
* Requirements POC docs from Claude Code eval
* Removed unnecessary reference.
* Excluded custom DRF configurations per @AlanCoding
* Implement review changes from @chrismeyersfsu
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
* prometheus-client returns an additional value as of v.0.22.0
* add license, remove outdated ones, add new embedded sources
* update requirements and UPGRADE BLOCKERs in README
* added sonar config file and started cleaning up
* we do not place the report at the root of the repo
* limit scope to only the awx directory and its contents
* update exclusions for things in awx/ that we don't want covered
settings.SUBSCRIPTIONS_USERNAME and
settings.SUBSCRIPTIONS_CLIENT_ID
should be mutually exclusive. This is because
the POST to api/v2/config/attach/ accepts only
a subscription_id, and infers which credentials to
use based on settings. If both are set, it is ambiguous
and can lead to unexpected 400s when attempting
to attach a license.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Previously, we would error out because we assumed that when we got a
metrics payload from redis, that there was data in it and it was for
the current host.
* Now, we do not assume that since we got a metrics payload, that is
well formed and for the current hostname because the hostname could
have changed and we could have not yet collected metrics for the new
host.
Separate out operation subsystem metrics to fix duplicate error
Remove unnecessary comments
Revert to single subsystem_metrics_* metric with labels
Format via black
* We had race conditions with the system_administrator role being
created just-in-time. Instead of fixing the race condition(s), dodge
them by ensuring the role always exists
* Disconnect logic to fill in role parents
Get tests passing hopefully
Whatever SonarCloud
* remove role parents/children endpoints and related views
* remove duplicate get_queryset method from RoleTeamsList
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
Allow users to do subscription management using
Red Hat username and password.
In basic auth case, the candlepin API
at subscriptions.rhsm.redhat.com will be used instead
of console.redhat.com.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Added tests for cross org sharing of credentials
* added negative testing for sharing of credentials
* added conditions and tests for roleteamslist regarding cross org credentials
* removed redundant codes
* made error message more articulated and specific
* resolve bug and add simple unit tests
* Update awx_collection/plugins/modules/license.py
Co-authored-by: Andrew Potozniak <tyraziel@gmail.com>
---------
Co-authored-by: Andrew Potozniak <tyraziel@gmail.com>
Bump migrations and delete some files
Resolve remaining conflicts
Fix requirements
Flake8 fixes
Prefer devel changes for schema
Use correct versions
Remove sso connected stuff
Update to modern actions and collection fixes
Remove unwated alias
Version problems in actions
Fix more versioning problems
Update warning string
Messed it up again
Shorten exception
More removals
Remove pbr license
Remove tests deleted in devel
Remove unexpected files
Remove some content missed in the rebase
Use sleep_task from devel
Restore devel live conftest file
Add in settings that got missed
Prefer devel version of collection test
Finish repairing .github path
Remove unintended test file duplication
Undo more unintended file additions
- Add ORG_ADMINS_CAN_SEE_ALL_USERS and MANAGE_ORGANIZATION_AUTH to the
settings_to_migrate list in SettingsMigrator
- Create comprehensive unit tests for SettingsMigrator class with
parameterized test cases
- Tests cover all migration scenarios including the new organizational
settings
- Refactored tests use pytest.mark.parametrize for better maintainability
and coverage
Co-authored-by: Claude <claude@anthropic.com>
Updated setuptools version from 78.1.1 to 80.9.0 in Makefile, requirements.in, and requirements.txt to ensure compatibility and address any potential issues with older versions.
* Allow creating galaxy credential types without an organization (#16077)
* remove requirement for galaxy credentials to belong to an organization
* remove organization check for galaxy credential type
* add functional test
* Update Python dependencies
Relaxed or updated version constraints for several dependencies in requirements files and Makefile, including Cython, asciichartpy, msgpack, python-daemon, and pyyaml. These changes address build issues, remove unnecessary pins, and update to newer compatible versions.
* remove docutils license
* we no longer have this as a dep so we don't need to carry its license
* Update dependencies to address security vulnerabilities
Bumped versions of cryptography, protobuf, and idna in requirements to address CVE-2024-26130, CVE-2025-4565, and CVE-2024-3651. These updates improve security by resolving known vulnerabilities in the affected packages.
---------
Co-authored-by: thedoubl3j <jljacks93@gmail.com>
* fix: do not create multiple mappers for lists of emails or usernames
* fix: create multiple matchers, don't rely on matches_or
* fix tests
* truncate mapper names to a max of 128 chars
* better naming scheme for matchers
* Add LDAP support to gateway_mapping and expand test coverage
- Add new process_ldap_user_list function for LDAP group processing
- Add auth_type parameter to org_map_to_gateway_format and team_map_to_gateway_format
- Support both 'sso' and 'ldap' authentication types in mapping functions
- Fix syntax error and logic bug in existing code
- Add comprehensive unit tests for process_ldap_user_list function (13 test cases)
- Add unit tests for auth_type parameter functionality
- Update helper functions to support new auth_type parameter
- All tests pass and maintain backward compatibility
Technical changes:
- process_ldap_user_list handles None, boolean, string, and list inputs
- Proper type hints with mypy compatibility
- LDAP groups use 'has_or' trigger format vs SSO attribute matching
- Boolean True/False create Always/Never Allow triggers for LDAP
- Maintains proper ordering and mapper structure
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
* Fix empty list bug in process_ldap_user_list and add comprehensive tests
- Fix process_ldap_user_list to return empty list for empty input instead of creating invalid trigger
- Empty list [] now correctly returns no triggers instead of trigger with empty has_or array
- Add test case for empty list behavior in both LDAP and SSO functions
- Update existing test_empty_list to expect correct behavior (0 triggers)
- Maintain backward compatibility for all other input types
- Comprehensive testing confirms no regression in existing functionality
Bug Details:
- Before: process_ldap_user_list([]) returned [{'name': 'Match User Groups', 'trigger': {'groups': {'has_or': []}}}]
- After: process_ldap_user_list([]) returns [] (correct behavior)
- SSO function already handled this correctly
This prevents potential Gateway issues with empty has_or arrays and ensures logical consistency.
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
* Add comprehensive LDAP migrator tests and fix category handling
- Add comprehensive unit test suite for LDAPMigrator class (26 tests)
- Test LDAP configuration scenarios including multiple instances, mappings, and edge cases
- Add tests for mixed boolean/group mappings, special characters in org names, and empty configs
- Fix LDAP authenticator category to always be 'ldap' (not 'ldap<suffix>')
- Add auth_type='ldap' parameter to org_map_to_gateway_format and team_map_to_gateway_format calls
- Include AAP-51531 reference comments for specific test cases
- All tests passing (26/26)
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
- Added missing Pattern and Any imports from typing
- Fixed users parameter type hint to include Pattern[str]
- Simplified overly complex return type annotation to use Any
- Added proper type narrowing with isinstance() and cast()
- Resolved mypy errors about incompatible list item types
Co-authored-by: Claude (Anthropic AI Assistant) <claude@anthropic.com>
This commit completely refactors how SSO organization and team mappings are processed
and exported for Gateway authentication, moving from a group-based approach to a more
flexible attribute-based system.
Key Changes:
- Introduced new process_sso_user_list() function for centralized user processing
- Enhanced boolean handling to support both native booleans and string representations
- Added email detection and regex pattern support for flexible user matching
- Refactored trigger generation from groups-based to attributes-based system
Gateway Mapping Enhancements (awx/main/utils/gateway_mapping.py):
- Added email regex detection for automatic email vs username classification
- Added pattern_to_slash_format() for regex pattern conversion
- Enhanced process_sso_user_list() with support for:
- Boolean values: True/False and ["true"]/["false"]
- String usernames and email addresses with automatic detection
- Regex patterns with both username and email matching
- Custom email_attr and username_attr parameters
- Refactored team_map_to_gateway_format() to use new processing system
- Refactored org_map_to_gateway_format() to use new processing system
- Changed trigger structure from {"groups": {"has_or": [...]}} to attribute-based triggers
- Improved naming convention to include trigger type in mapping names
Comprehensive Test Coverage (awx/main/tests/unit/utils/test_auth_migration.py):
- Added complete TestProcessSSOUserList class with 8 comprehensive test methods
- Enhanced TestOrgMapToGatewayFormat with string boolean and new functionality tests
- Enhanced TestTeamMapToGatewayFormat with string boolean and new functionality tests
- Added tests for email detection, regex patterns, and custom attributes
- Verified backward compatibility and integration functionality
- All existing tests updated to work with new attribute-based trigger system
Breaking Changes:
- Trigger structure changed from group-based to attribute-based
- Mapping names now include trigger descriptions for better clarity
- Function signatures updated to include email_attr and username_attr parameters
Co-Authored with Claude-4 via Cursor
Co-authored-by: Peter Braun <pbraun@redhat.com>
Syncing from new rbac to old rbac locally calls the
disable_rbac_sync() context manager.
If rbac sync is disabled, we do not need to remote
sync, as we can assume the remote syncing already
occurred in the viewset.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* fix: tacacs+ -> TACACSPLUS
Gateway doesn't allow `+` to be used in slug.
AAP-50774
* Fixed assertion
---------
Co-authored-by: Andrew Potozniak <potozniak@redhat.com>
* Added better error handling and messaging when the service token authentication is broken. Allowed for GATEWAY_BASE_URL to override the service token's base url if it is set in the environment variables.
Co-Authored-By: Cursor (claude-4-sonnet)
* Removed GATEWAY_BASE_URL override for service token auth.
* Working branch for testing DAB RBAC changes
* AAP-48392 Handle DAB RBAC either before or after new type model (for merge) (#16045)
* Handle DAB RBAC either before or after new type model
* Translate CT to DAB CT
* Fix for rearrangement of post_migration methods
* Directly include RBAC service URLs
* Add a run before remote permission additions
* Sync old rbac to remote rbac (#7025)
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Set DAB requirement back to devel
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
* migrate settings using the existing authenticator framework
* add method to get settings value to gateway client
* add transformer functions for settings
* Switched back to PUT for settings updates
* Started wiring in testing changes
* Added settings_* aggregation results. Added skip-github option. Added tests.
Assisted-by: Cursor
* Added --skip-all-authenticators command line argument. Added GoogleOAuth testing. Added tests for skipping all authenticators.
Assisted-by: Cursor
* wip: migrate other missing settings
* update login_redirect_override in google_oauth2
* impement login redirect for azuread
* implement login redirect for github
* implement login redirect for saml
* set LOGIN_REDIRECT_OVERRIDE even if no authenticator matched
* extract logic for login redirect override to base class
* use urlparse to compare valid redirect urls
* Preserve the original query parameters
* Fix flake8 issues
* Preserve the query parameter in sso_login_url
Gateway sets the sso_login_url to
/api/gateway/social/login/aap-saml-keycloak/?idp=IdP
The idp needs to be preserved when creating the redirect
* Update awx/main/utils/gateway_client.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* Update awx/main/management/commands/import_auth_config_to_gateway.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* list of settings updated
* Update awx/main/utils/gateway_client.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* Update awx/sso/utils/base_migrator.py
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* fix tests
---------
Co-authored-by: Andrew Potozniak <potozniak@redhat.com>
Co-authored-by: Madhu Kanoor <mkanoor@redhat.com>
Co-authored-by: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
* migrate team on team users
add setting to prevent team on team cases. remove tests that should fail now
* adjust tests for disallowing team on teams
* use RoleUserAssignment to retrieve users
* assign users with RoleUserAssignment instead
* fix broken test
* move methods out to utils file. add tests
* add missed positional arg
* test old rbac system also consolidates
* fix test
* feat: AAP-48498 Radius authenticator migrator
Issue: AAP-48498
* fix: Namingm Style and tests
* enabled by default
* test: SECRET is now ignored unless --force is set
* add force flag to enforce updates even when authenticator already exists
* remove cleartext field
* update list of encrypted fields
* show updated and unchanged authenticators in report
* collect controller ldap configuration
* translate role mapping and submit ldap authenticator
* implement require and deny group mapping
* remove all references of awx in the naming
* fix linter issues
* address PR feedback
* update ldap authenticator naming
* update github authenticator naming
* assume that server_uri is always a string
* update order of evaluation for require and deny groups
* cleanup and move ldap related functions into the ldap migrator
* add skip option for saml
* update saml authenticator to new slug format
* update azuread authenticator to new slug format
This PR migrates the SAML configuration from the Controller
to the Gateway, it intentionally skips setting the CALLBACK_URL
so that the Gateway can fill in the appropriate URL.
Remove Controller specific roles
Removes
- Controller Organization Admin
- Controller Organization Member
- Controller Team Admin
- Controller Team Member
- Controller System Auditor
Going forward the platform role definitions
will be used, e.g. Organization Member
The migration will take care of any assignments
with those controller specific roles and use
the platform roles instead.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* compare authenticators and mappers before recreating them
* add unit tests
* fix linter errors
* refactor and improve: better implementation for get_authenticator_by_slug and removal of redundant code
* add submit_authenticator method to handle create vs. update in a generic way
* remove unused import
Remove ALLOW_LOCAL_RESOURCE_MANAGEMENT setting and enable local resource management
This commit removes the ALLOW_LOCAL_RESOURCE_MANAGEMENT setting and all associated
functionality, making the behavior as if the setting is always enabled.
Changes:
- Remove ALLOW_LOCAL_RESOURCE_MANAGEMENT setting from defaults.py
- Remove @immutablesharedfields decorator and all related logic
- Remove decorator applications from Organization, Team, and User API views
- Remove role assignment restrictions in UserRolesList and RoleUsersList
- Remove test file for immutablesharedfields functionality
- Clean up unused imports
Result: Organizations, Teams, and Users can now always be created, modified,
and deleted via the API without platform ingress restrictions.
* wip: management command for authenticator export to GateWay
* wip: implement ldap auth config migration
* refactor: split concerns into gathering config and converting / recreating config
* refactor: dry run by default
* use the authenticator slug for idempotency
* move to correct utils path
* use env vars instead of flags, fix linter errors
* remove unused import
* Fix issue where export module does not honor CONTROLLER_OPTIONAL_API_URLPATTERN_PREFIX
* Add unit test and handle leading/trailing slashes
* Reformat
* Refactor for clarity
* Remove unused import
* Move logic to unified job model instead of view
* Refine logic to only apply to double escaped characters to prevent touching unicord chars
* Refine logic to only apply to stdout so that it does not impact webhook notifications
* Revise naming to reflect correction to escapes, not just escape quotes
* Update code comments to reflect fixing double escapes vs double escaped quotes specifically
* Add regex for 5 most common python escape chars to make fix more robust
* Fix issue where export module does not honor CONTROLLER_OPTIONAL_API_URLPATTERN_PREFIX
* Add unit test and handle leading/trailing slashes
* Reformat
* Refactor for clarity
* Remove unused import
* the workflow has been failing silently without catching a merge
conflict. this removes the fail pretty logic previously implemented.
* just fail if a merge conflict is encountered
* Revise start_fact_cache and finish_fact_cache to use JSON file (#15970)
* Revise start_fact_cache and finish_fact_cache to use JSON file with host list inside it
* Revise artifacts path to be relative to the job private_data_dir
* Update calls to start_fact_cache and finish_fact_cache to agree with new reference to artifacts_dir
* Prevents unnecessary updates to ansible_facts_modified, fixing timestamp-related test failures.
* Import bulk_update_sorted_by_id
* Removed assert that calls ansible_facts_new which was removed in the backported pr
* Add import of Host back
* Fix some patterns in collection test playbooks
* Revert change to ansible.builtin.user
* Revert change to WFJT for dup label error
* Add error handling and fix references
* Add back lookup organization
* Fix all remainingfailing syntax in workflow_job_template
* Allow creating galaxy credential types without an organization (#16077)
* remove requirement for galaxy credentials to belong to an organization
* remove organization check for galaxy credential type
---------
Co-authored-by: AlanCoding <arominge@redhat.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* Update requirements for setuptools
* first pass and need to commit
* update makefile and run updater script
* updated makefile per readme
* ran updater script
* Patch irc backend to avoid namespace collision w/ jaraco
When importing the IRC backend, jaraco resolves to
the version vendored inside setuptools:
1) importing irc backend…
irc_backend ERROR: ModuleNotFoundError("No module named 'jaraco.stream'")
2) sys.modules['jaraco'] after failure:
present: True
type: <class 'module'>
__file__: /var/lib/awx/venv/awx/lib64/python3.11/site-packages/setuptools/_vendor/jaraco/__init__.py
__path__: ['/var/lib/awx/venv/awx/lib64/python3.11/site-packages/setuptools/_vendor/jaraco']
__spec__: ModuleSpec(name='jaraco',
loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f006a0eccd0>,
origin='/var/lib/awx/venv/awx/lib64/python3.11/site-packages/setuptools/_vendor/jaraco/__init__.py',
submodule_search_locations=['/var/lib/awx/venv/awx/lib64/python3.11/site-packages/setuptools/_vendor/jaraco'])
Since setuptools does not vendor jaraco.stream, it blew up. This patch ensures
jaraco.stream gets imported *before* attempting to import the irc modules.
* Revert "[4.6][dependency] CVE 2025 47273 (#7020)" (#7027)
This reverts commit e8b2920aec.
* reformatted irc backend with black
* ran black to fix linting issues
* Reapply "[4.6][dependency] CVE 2025 47273 (#7020)" (#7027)
This reverts commit 0c6df9b13398a93569fae7558e1a0e72cbe8fb6c.
* add flake8 ignore since jaraco.stream is needed
* jaraco.stream is not directly called in the file but is needed by irc
so ignore the linter failure
---------
Co-authored-by: Shane McDonald <me@shanemcd.com>
* clear LICENSE from cache on change
* Adds tests for license cache clearing
Generated by Cursor (claude-4-sonnet)
* test fixes
Generated with Cursor (claude-4-sonnet)
---------
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
Co-authored-by: Jake Jackson <jljacks93@gmail.com>
* fixes UnboundLocalError in POST /attach
* bust cache for credentials before attaching subscription
---------
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
* clear LICENSE from cache on change
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
* Adds tests for license cache clearing
Generated by Cursor (claude-4-sonnet)
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
* test fixes
Generated with Cursor (claude-4-sonnet)
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
---------
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
Co-authored-by: Jake Jackson <jljacks93@gmail.com>
* fixes UnboundLocalError in POST /attach
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
* bust cache for credentials before attaching subscription
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
---------
Signed-off-by: Robin Y Bobbitt <rbobbitt@redhat.com>
* Bug fix for AAP-47771 this data migration updates existing CredentialType entries
in the database and changes the kind from github_app to github_app_lookup
* Combine migration 0203 into 0202
* Add test to ensure reconciliation issue has been resolved
* Fix collection task breaking collection ci checks
* Patch ansible.module_utils.basic._ANSIBLE_PROFILE directly
* Conditionalize other santity assertions
* Remove added blank lines and identifier from Fail if absent and no identifier set
* Update collection args (#16025)
* update collection arguments
* Add integration testing for new param
* fix: sanity check failures
---------
Co-authored-by: Sean Sullivan <ssulliva@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* update formatting for sanity testing
* fixing indentation for sanity suite
* adjust tests to use new token name
* update tests to use aap_token instead of controller_oauthtoken
* add back aliases for backward compat
* we have integration tests that still leverage the old token name
* while we can rename these, this tells me that customers might still
have them in the wild and breaking them in a z stream is no bueno
* revert alias changes
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
Co-authored-by: Sean Sullivan <ssulliva@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* wfjt migration to catch renaming
* Added rename_wfjt function to template constraint migration
* Add test to add duplicate names and verify that the duplicates are renamed
* move object creation
* add missing rename_wfjt operation
* fix linter issues
* fix tox issues
* test manually and move operation
* added back credential type validation code
* Handle DAB RBAC either before or after new type model
* Translate CT to DAB CT
* Fixes for content type switch
* Use more compatible coding pattern
* Deeper purge of content_type_id
* revert, turns out that did not work
* More content type replacements
* Revert changes to serializer
* Revert another content_type change
* Fix for rearrangement of post_migration methods
* Remove thing I am not going to do
* Revert branch pin that was temporary
* add branch sync between release_4.6 and stable-2.6
* add a new workflow to force push commits in release_4.6 to
stable-2.6
* Update workflow to use matrix keyword
---------
Co-authored-by: Jake Jackson
Remove ALLOW_LOCAL_RESOURCE_MANAGEMENT setting and enable local resource management
This commit removes the ALLOW_LOCAL_RESOURCE_MANAGEMENT setting and all associated
functionality, making the behavior as if the setting is always enabled.
Changes:
- Remove ALLOW_LOCAL_RESOURCE_MANAGEMENT setting from defaults.py
- Remove @immutablesharedfields decorator and all related logic
- Remove decorator applications from Organization, Team, and User API views
- Remove role assignment restrictions in UserRolesList and RoleUsersList
- Remove test file for immutablesharedfields functionality
- Clean up unused imports
Result: Organizations, Teams, and Users can now always be created, modified,
and deleted via the API without platform ingress restrictions.
* Mark the collection role module as deprecated
* Mark deprecated in DOCUMENTATION
* Add deprecation info
* Resolve validate-modules deprecation errors
---------
Co-authored-by: Luis <lvilla@redhat.com>
Sort both bulk updates and add batch size to facts bulk update to resolve deadlock issue
Update tests to expect batch_size to agree with changes
Add utility method to bulk update and sort hosts and applied that to the appropriate locations
Update functional tests to use bulk_update_sorted_by_id since update_hosts has been deleted
Add comment NOSONAR to get rid of Sonarqube warning since this is just a test and it's not actually a security issue
Fix failing test test_finish_job_fact_cache_clear & test_finish_job_fact_cache_with_existing_data
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
Co-authored-by: Seth Foster <fosterbseth@gmail.com>
They are only surfaced under pytest 8.4, with `pytest-cov` and
`pytest-xdist` being both active [[1]]. Or equivalent situations
This is a follow-up for #16015 which attempted ignoring the warning
on the runtime level in pytest. Instead, the patch tells `coveragepy`
not to emit said warnings in the first place.
[1]: pytest-dev/pytest-cov#693
It should ideally match perfectly or at least come close, for best
responsiveness. This setting is currently used to prevent Codecov
from publishing incomplete coverage metrics too early.
* Update the workflow to allow the action to write our branches from it.
* Also added username and email as git by default will want to know who
is performing the action (edge case). Using github actions bot is
standard practice
* 🧪 Unpersist Git creds @ cov combine job
This is one of the things Zizmor [[1]] warns about.
[1]: https://docs.zizmor.sh
* 🧪 Download all coverage artifacts in one go
* 🧪 Delegate artifact garbage collection to GH
This is implemented by setting the retention days input to 1 on the
initial upload.
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <webknjaz@redhat.com>
* 🧪 Unpersist Git creds @ cov combine job
This is one of the things Zizmor [[1]] warns about.
[1]: https://docs.zizmor.sh
* 🧪 Download all coverage artifacts in one go
* 🧪 Delegate artifact garbage collection to GH
This is implemented by setting the retention days input to 1 on the
initial upload.
```
/var/lib/awx/venv/awx/lib64/python3.11/site-packages/_pytest/python.py:163:
PytestReturnNotNoneWarning: Expected None, but
awx/main/tests/unit/test_tasks.py::TestJobCredentials::test_custom_environment_injectors_with_boolean_extra_vars
returned ['successful', 0], which will be an error in a future version
of pytest. Did you mean to use `assert` instead of `return`?
```
* Dug into the git blame for this one
060585434a is the commit for any
historians. It was wrongfully carried over from a mock pexpect
implementation. Our new tests are nice. They don't go as far as trying
to run the task so they do not need to mock pexpect. That is why it is
safe to remove this code without finding it a new home.
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
* Make the JT name uniqueness enforced at the database level
* Forgot demo project fixture
* New approach, done by adding a new field
* Update for linters and failures
* Fix logical error in migration test
* Revert some test changes based on review comment
* Do not rename first template, add test
* Avoid name-too-long rename errors
* Insert migration into place
* Move existing files with git
* Bump migrations of existing
* Update migration test
* Awkward bump
* Fix migration file link
* update test reference again
When POSTing to console.redhat.com, fallback
to using basic auth method if OAUTH via
service accounts fails
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Add no cov on fail flag to fix CI
* Bump django to 4.2.21
* Coverage args
* Try cov equals awx
* Run migration test in parallel
* Ignore error and clean up commands
* Try to make schema check not hang
---------
Co-authored-by: Satoe Imaishi <simaishi@redhat.com>
* Clean up logging when receptor not running
* Make logging more concise for unregistered case
* Silence another unwanted traceback
* Silence a few more tracebacks
* removing the requirement for re and changing to startswith which the other AAP collections use
* telling sonarqube to ignore this line
* fixing lint error
Fallback to basic auth if OAUTH to console.redhat.com fails
Notes:
Envoy has a timeout of 30 seconds, so
the total timeout should be less than that.
(5, 20) means 5 seconds to connect to server,
20 seconds to start reading data.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Fix bug where collectstatic could error due to dispatcherd config
* Revert test because it will not work in test suite
* New publish mocking system
* Remove import of unused
* Fix default publish broker
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
* Fast fix for old version nodejs
Fixing error
required: { node: '^18.0.0 || >=20.0.0' },
current: { node: 'v16.13.1', npm: '8.5.0' }
* Use node js 18 by default to align with official docs
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Use dynamic AWX max_workers value
Make basic --status and --running commands work
Make feature flag enabled true by default for development
* [dispatcherd] Dispatcher socket-based `--status` demo working (#15908)
* Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) (#15911)
* refactor(system): extract common heartbeat helpers and split cluster_node_heartbeat
Extract common heartbeat logic into helper functions: _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection. _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary. _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.
Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).
Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.
Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.
* feat(publish): delegate heartbeat task submission to new dispatcherd implementation
Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.
When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.
Preserve the original behavior for all other tasks and fallback on error.
* refactor(system): extract task ID retrieval from dispatcherd into helper function
Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.
* fix(dispatcher): Enable task decorator to work with and without feature flag
Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.
AAP-41775
* fix(dispatcher): Improve error handling and logging in feature flag implementation
- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions
This completes the robust implementation for switching between dispatcher
implementations based on feature flags.
AAP-41775
* fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility
Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.
The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging
This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.
AAP-41775
* refactor(dispatcher): Remove excessive logging from dispatcher implementation
Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting
Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging
AAP-41775
* fix(dispatcher): Fix shallow copy in dispatcher schedule conversion
This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.
Refs: AAP-41775
* Use IPC mechanism to get running tasks (#15926)
* Allow tasks from tasks
* Fix failure to limit to waiting jobs
* Get job record with lock
* Fix failures in dispatcherd feature branch (#15930)
* Fully handle DispatcherCancel
* Complete rest of preload import work
* Complete dispatcherd integration & job cancellation (AAP-43033) (#15941)
* feat(dispatcher): Implement job cancellation for new dispatcher
Adds feature-flag-aware job cancellation that routes cancel requests to either
the legacy dispatcher or the new dispatcherd library based on the
FEATURE_NEW_DISPATCHER flag.
- Updates cancel_dispatcher_process() to use dispatcherd's control API when enabled
- Handles both direct cancellation and task manager workflow cancellation cases
- Works with DispatcherCancel exception handling to properly handle SIGUSR1 signals
AAP-43033
* fix(dispatcher): Update run_dispatcher.py to properly handle task cancellation
Modifies the cancel command in run_dispatcher.py to properly cancel tasks
when the FEATURE_NEW_DISPATCHER flag is enabled, rather than just listing
running tasks.
The implementation translates each task UUID to the appropriate
filter format expected by the dispatcherd control API, maintaining the same
behavior as the original implementation.
Part of: AAP-43033
* refactor(system): Refactor dispatch_startup() to extract common startup logic and branch based on feature flag
This commit refactors the dispatch_startup() function to improve clarity and consistency across the legacy
and new dispatcherd flows.
No dispatcher-specific functionality is needed beyond the changes made, so this refactoring improves robustness without
altering core behavior.
* refactor(system): Refactor inform_cluster_of_shutdown() for clarity
* refactor(tasks): Replace @task with @task_awx across 22 tasks for dispatcher compatibility
- Migrated all task decorators to use @task_awx, ensuring dispatcher-aware behavior.
- Tested each task with the new dispatcherd, verifying that tasks using the registry pattern execute correctly without needing binder‐based alternative implementations.
- Removed redundant logging and outdated comments.
- Legacy tasks that do not require special parameter extraction continue to use their original logic.
- This commit reflects our complete journey of testing and verifying dispatcherd compatibility across all 22 tasks.
* refactor(publish): fix linter
* Fix bug from the branch rebase
* AAP-43763 Add tests for connection management in dispatcherd workers (#15949)
* Add test for job cancel in live tests
* Fix bug from the branch rebase
* Add test for connection recovery after connection broke
* Add test for breaking connection
* Fix dispatcherd bugs: schedule aliases, job kwargs handling, cancel handling (#15960)
* Put in job kwargs handling, not done before
* AAP-44382 [dispatcherd] Fixes for running with feature flag off (#15973)
* Use correct decorator for test of tasks
* Finalize dispatcherd feature branch (#15975)
* Work dispatcherd into dependency management system
* Use util methods from DAB
* Rename the dispatcherd feature flag, and flip default to not-enabled
* Move to new submit_task method
* Update the location of the sock file
* AAP-44381 Make dispatcherd config loading more lazy (#15979)
* Make dispatcherd config loading more lazy
* Make submission error more obvious
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd (#15983)
* Fix signal handling gap, hijack SIGUSR1 from dispatcherd
* Minor adjustments to dispatcherd status command
* [dispatcherd] Get rid of alternative task registry (#15984)
Get rid of alternative task registry
* Fix deadlock error and other cleanup errors (#15987)
* Move to proper error handling location
---------
Co-authored-by: artem_tiupin <70763601+art-tapin@users.noreply.github.com>
Retire workers after a certain age, allowing them to finish their
current task if they are not idle.
This mitigates any issues like memory leaks in long running workers,
especially if systems stay busy for months at a time.
Introduce new optional setting WORKER_MAX_LIFETIME_SECONDS, defaulting to 4 hours
* add license expiry metric
* Update metrics test with default value to the new license metrics
* Add the changes of the black-lint command
* Update awx/main/analytics/metrics.py
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
With the "recent" changes making the lookup plugin `awx.awx.schedule_rrule` and
`awx.awx.schedule_rruleset` returning a list instead of string (see #15625), the
returned list (which will *always* carry only 1 item) needs to be transformed
to a string either adding `| join` or `| first`. I found `first` to be more
fitting as the list will *always* return a list with 1 item.
Additionally, the documentation that references `awx.awx.schedule_rruleset`
in the `awx.awx.schedule` module was wrong, which is also fixed by this PR.
Signed-off-by: Steffen Scheib <sscheib@redhat.com>
Co-authored-by: Steffen Scheib <steffen@scheib.me>
Retire workers after a certain age, allowing them to finish their
current task if they are not idle.
This mitigates any issues like memory leaks in long running workers,
especially if systems stay busy for months at a time.
Introduce new optional setting WORKER_MAX_LIFETIME_SECONDS, defaulting to 4 hours.
* Make the JT name uniqueness enforced at the database level
* Forgot demo project fixture
* New approach, done by adding a new field
* Update for linters and failures
* Fix logical error in migration test
* Revert some test changes based on review comment
* Do not rename first template, add test
* Avoid name-too-long rename errors
* Insert migration into place
* Move existing files with git
* Bump migrations of existing
* Update migration test
* Awkward bump
* Fix migration file link
* update test reference again
* Revise start_fact_cache and finish_fact_cache to use JSON file with host list inside it
* Revise artifacts path to be relative to the job private_data_dir
* Update calls to start_fact_cache and finish_fact_cache to agree with new reference to artifacts_dir
* Prevents unnecessary updates to ansible_facts_modified, fixing timestamp-related test failures.
* Delete existing all-group vars on inventory sync (with overwrite-vars=True) instead of merging them.
* Implementation of inv var handling with file as db.
* Improve serialization to file of inv vars for src update
* Include inventory-level variable editing into inventory source update handling
* Add group vars to inventory source update handling
* Add support for overwrite_vars to new inventory source handling
* Persist inventory var history in the database instead of a file.
* Remove logging which was needed during development.
* Remove further debugging code and improve comments
* Move special handling for user edits of variables into serializers
* Relate the inventory variable history model to its inventory
* Allow for inventory variables to have the value 'None'
* Fix KeyError in new inventory variable handling
* Add unique-together constraint for new model InventoryGroupVariablesWithHistory
* Use only one special invsrc_id for initial update and manual updates
* Fix internal server error when creating a new inventory
* Print the empty string for a variable with value 'None'
* Fix comment which incorrectly states old behaviour
* Fix inventory_group_variables_update tests which did not take the new handling of None into account
* Allow any type for Ansible-core variable values
* Refactor misleading method names
* Fix internal server error when savig vars from group form
* Remove superfluous json conversion in front of JSONField
* Call variable update from create/update instead from validate
* Use group_id instead of group_name in model InventoryGroupVariablesWithHistory
* Disable new variable update handling for all regular (non-'all') groups
* Add live test to verify AAP-17690 (inv var deleted from source)
* Add functional tests to verify inventory variables update logic
* Fix migration which was corrupted by a rebase
* Add a more complex live test and resolve linter complaints
* Force overwrite_vars=False for updates from source on all-group
* Change behavior with respect to overwrite_vars
Ensure service account authentication is being used
when falling back to using SUBSCRIPTIONS_CLIENT_ID.
Additional change:
Subscription data can return two types of capacities:
Sockets and Nodes
For determining overall capacity
if capacity name is Nodes:
capacity quantity x subscription quantity
if capacity name is Sockets:
capacity quantity / 2 (minimum of 1) x subscription quantity
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Update subscription API to use service accounts
Update code to pull subscriptions from
console.redhat.com instead of
subscription.rhsm.redhat.com
Uses service account client ID and client secret
instead of username/password, which is being
deprecated in July 2025.
Additional changes:
- In awx.awx.subscriptions module, use new service
account params rather than old basic auth params
- Update awx.awx.license module to use subscription_id
instead of pool_id. This is due to using a different API,
which identifies unique subscriptions by subscriptionID
instead of pool ID.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* fix token name
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Fix Subscriptions credentials fallback
Ensure service account authentication is being used
when falling back to using SUBSCRIPTIONS_CLIENT_ID.
Additional change:
Subscription data can return two types of capacities:
Sockets and Nodes
For determining overall capacity
if capacity name is Nodes:
capacity quantity x subscription quantity
if capacity name is Sockets:
capacity quantity / 2 (minimum of 1) x subscription quantity
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* Demo of sorting hosts live test
* Sort both bulk updates and add batch size to facts bulk update to resolve deadlock issue
* Update tests to expect batch_size to agree with changes
* Add utility method to bulk update and sort hosts and applied that to the appropriate locations
Remove unused imports
Add utility method for sorting bulk updates
Remove try except OperationalError for loop
Remove unused import of django.db.OperationalError
Remove batch size as it is now on the bulk update utility method as 100
Remove batch size here since it is specified in sortedbulkupdate
Add transaction.atomic to have entire transaction is run as a signle transaction before committing to the db
Revert change to bulk update as it's not needed here and just sort instead
Move bulk_sorted utility method into db.py and updated name to not be specific to Hosts
Revise to import bulk_update_sorted.. rather than calling it as an argument
Fix way I'm importing bulk_update_sorted.. Remove unneeded Host import and remove calls to bul_update as args
Rebise calls to bulk_update_sorted.. to include Host in the args
REmove raw_update_hosts method and replace with bulk_update_sorted_by_id in update_hosts
Remove update_hosts function and replace with bulk_update_sorted_by_id
Update live tests to use bulk_update_sorted_by_id
Fix the fields in bulk_update to agree with test
* Update functional tests to use bulk_update_sorted_by_id since update_hosts has been deleted
Replace update_hosts with bulk_update_sorted_by_id
Remove referenes to update_hosts
Update corresponding fact cachin tests to use bulk_update_sorted_by_id
Remove import of bulk_sorted_update
Add code comment to live test to silence Sonarqube hotspot
* Add comment NOSONAR to get rid of Sonarqube warning since this is just a test and it's not actually a security issue
Get test_finish_job_fact_cache_with_existing_data passing
Get test_finish_job_fact_cache_clear passing
Remove reference to raw_update and replace with new bulk update utility method
Add pytest.mark.django_db to appropriate tests
Corrent which model is called in bulk_update_sorted_by_id
Remove now unused Host import
Point to where bulk_update_sorted_by_id to where that is actually being used
Correct import of bulk_update_sorted_by_id
Revert changes in this file to avoid db calls issue
Remove @pytest.mark.django_db from unit tests
Remove commented out host sorting suggested fix
Fix failing tests test_pre_post_run_hook_facts_deleted_sliced & test_pre_post_run_hook_facts
Remove atomic transaction line, add return, and add docstring
* Fix failing test test_finish_job_fact_cache_clear & test_finish_job_fact_cache_with_existing_data
---------
Co-authored-by: Alan Rominger <arominge@redhat.com>
Update code to pull subscriptions from
console.redhat.com instead of
subscription.rhsm.redhat.com
Uses service account client ID and client secret
instead of username/password, which is being
deprecated in July 2025.
Additional changes:
- In awx.awx.subscriptions module, use new service
account params rather than old basic auth params
- Update awx.awx.license module to use subscription_id
instead of pool_id. This is due to using a different API,
which identifies unique subscriptions by subscriptionID
instead of pool ID.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
Co-authored-by: Peter Braun <pbraun@redhat.com>
* Django url validators support ipv6, our custom URLField
allow_plain_hostname feature was messing with the hostname before it
was passed to Django validators.
* This change dodges the allow_plan_hostname transformations if the
value looks like it's an ipv6 one.
* Django url validators support ipv6, our custom URLField
allow_plain_hostname feature was messing with the hostname before it
was passed to Django validators.
* This change dodges the allow_plan_hostname transformations if the
value looks like it's an ipv6 one.
* Dump running tasks when running out of capacity
* Use same logic for max_workers and capacity
* Address case where CPU capacity is the constraint
* Add a test for correspondence
* Fake redis to make tests work
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
* Solved undesired behaviour with fact_cache
* Solved bug with slices
* Remove unused imports
Remove now unused line of code which was commented out by the contributor
Revert "Remove now unused line of code which was commented out by the contributor"
This reverts commit f1a056a2356d56bc7256957a18503bd14dcfd8aa.
* Add back line that had been commented out as this line makes hosts specific to the particular slice when applicable
Revise private_data_dir fixture to see if it improves code coverage
Checked out awx/main/tests/unit/models/test_jobs.py in devel to see if it resolves git diff issue
* Fix formatting in awx/main/tests/unit/models/test_jobs.py
Rename for loop from host in hosts to hosts in hosts_cahced and remove unneeded continue
Revise finish_fact_cache to utilize inventory rather than hosts
Remove local var hosts that was assigned but unused
Revert change in start_fact_cache hosts_cached back to hosts
Revise the way we are handling hosts_cached and joining the file
Revert "Revise the way we are handling hosts_cached and joining the file"
This reverts commit e6e3d2f09c1b79a9bce3647a72e3dad97fe0aed8.
Reapply "Revise the way we are handling hosts_cached and joining the file"
This reverts commit a42b7ae69133fee24d3a5f1b456d9c343d111df9.
Revert some of my changes to get back to a better working state
Rename for loop to host in hosts_cached and remove unneeded continue
Remove jobs job.get_hosts_for_fact_cache() from post run hook, fix if statement after continue block, and revise how we are calling hosts in finish for loop
Add test_invalid_host_facts to test_jobs to increase code coverage
Update method signature to use hosts_cached and updated other references to hosts in finish_facts_cached
Rename hosts iterator to hosts_cached to agree with naming elsewhere
Revise test_invalid_host_facts to get more code coverage
Revise test_invalid_host_facts to increase codecov
Revise test_pre_post_run_hook_facts_deleted_sliced to ensure we are hitting the assertionerror for code cov
Revise mock_inventory.hosts. to hit assert failure
Add revision of hosts and facts to force failure to satisfy code cov
Fix failure in test_pre_post_run_hook_facts_deleted_sliced
Add back for loop to create failures and add assert to hit them
Remove hosts.iterator() from both start_fact_cache and finish_fact_cache
Remove unused import of Queryset to satisfy api-lint
Fix typo in docstring hasnot to has not
Move hosts_cached.append(host) to outer loop in start_fact_cache
Add class to help support cached hosts resolving host.name issue with hosts_cached
* Add live tests for ansible facts
Remove fixture needed for local work only maybe
Revert "Add class to help support cached hosts resolving host.name issue with hosts_cached"
This reverts commit 99d998cfb9960baafe887de80bd9b01af50513ec.
* Move hosts_cached.append(host) outside of try except
* Move hosts_cached.append(host) to the beginning of start_fact_cache
---------
Signed-off-by: onetti7 <davonebat@gmail.com>
Co-authored-by: onetti7 <davonebat@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* Add in ESXI plugin as a choice
* added in vmware esxi as an inventory source
* made a migration that may not be needed but will need to circle back
* black formatting
* linter fixes that I missed in the first commit, squash
* Update esxi to use_fqcn to true
* added use_fqcn on the esxi cred to true to correctly lay down
collection name
* add fqcn true
* updated vmware esxi to use true for fqcn
* Update defaults and re-order migrations
* updated defaults to add correct env var to get empty
* re-ordered migrations to be in line with others
* Add condition to replace vmware_esxi cred
* replace direct name match with vmware cred since source supports old
cred
* add skeleton test
* quick pass, needs more
* squash this
* Add tests for creating inventory ESXI source
* add test case to test creating an inventory with different cred type
to source name
* update test and linting
* added correct cred return since esxi uses same cred
* assert on status code
* assert that we received a 204
* Added new folder for vmware_exsi and empty json file.
* Corrected the misspelling of folder name to 'esxi'
* fixed misspelling for `vmware_`
---------
Co-authored-by: Thanhnguyet Vo <thavo@redhat.com>
* feat: 38589 GitHub App Authentication (#15807)
* feat: 38589 GitHub App Authentication
Allows both git@<personal-token> and x-access-token@<github-access-token> when authenticating using git.
This allows GitHub App tokens to work without interfering with existing authentication types.
---------
Co-authored-by: Jake Jackson <thedoubl3j@Jakes-MacBook-Pro.local>
* revert change made to allow UI to accept x-access-token, just use htt… (#15851)
revert change made to allow UI to accept x-access-token, just use https:// instead
* Add Github dep for new cred support if used (#15850)
* Add pygithub for new app token support
* fixed git requirements file with new
* added new github dep and relevant deps it needs
* add required licenses
* Add artifacts to satisfy license check
* Remove duplicated license
---------
Co-authored-by: Andrea Restle-Lay <arestlel@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* Remove deps update it came with the cherry-pick and is not needed in this version
Remove unneeded deps updates from requirements.in
Remove point to awx-plugins as it is not needed in tower
* Add a credential plugin that uses GitHub Apps to get tokens
* Add github app tests
* Ran requirements updater script
Ran black on github_app_test to fix formatting issue
Add scm_github_app to managed credentials
Ran updater script to reflect new deps
Added github app info to def build_passwords in jobs.py, cred now appears in credential types
Update ManagedCredentialType for GitHub App to match what we have in awx-plugins
Update inputs in maManagedCredentialType to github_app_inputs to communicate with awx/main/credential_plugins/github_app.py
Revert incorrect change in ManagedCredentialType, change github_app_lookup to call inputs instead of github_app_inputs
Updated namespace to github_app_lookup to agree with nomenclature used in the rest of the implementation and to resolve failing API test
Remove import pointing to awx plugins and update to point to credential_plugins
Remove references to gh_app_plugin_mod and change to github_app
Remove from awx_plugins.interfaces._temporary_private_django_api import ( # noqa: WPS436 to resolve failing test
Remove flake8 typing & typing references that do not exist in this version of Tower
Remove references in jobs.py and __init__.py since this is an external cred type and registered it in setup.cfg instead
Remove blank line
REvise name in cfg from github_app_lookup to github_app to see if that ifxes module not found error
Revise first declaration of github_app to agree with file name to see if that resolves issue
Rename line 174 to agree with what's in config
Fix reference to github_app_lookup to github_app
Linters compliaining about the github_app in __all__ not being defined, renamed to see if that aligns them
Fix naming in test to correspond to naming of cred type
Update naming to be more specific and add blank line to setup.cfg
Remove __all__ from githubapp.py to satisfy linters
Revert formatting change since it is not needed in this repository
* Add blank line at the end of requirements.in
---------
Co-authored-by: Andrea Restle-Lay <andrearestlelay@gmail.com>
Co-authored-by: Jake Jackson <thedoubl3j@Jakes-MacBook-Pro.local>
Co-authored-by: Jake Jackson <jljacks93@gmail.com>
Co-authored-by: Andrea Restle-Lay <arestlel@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* Support <collection_namespace>.<collection_name>.* indirect host query
to match ANY module in the <collection_namespace>.<collection_name>
* Add tests for new wildcard indirect host count
* error checking of ansible event name
* error checking of ansible event query
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Added test_jobs.py to the model unit test folder in orther to show the undesired behaviour with fact cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Solved undesired behaviour with fact_cache
Signed-off-by: onetti7 <davonebat@gmail.com>
* Solved bug with slices
Signed-off-by: onetti7 <davonebat@gmail.com>
* Remove unused imports
Remove now unused line of code which was commented out by the contributor
Revert "Remove now unused line of code which was commented out by the contributor"
This reverts commit f1a056a2356d56bc7256957a18503bd14dcfd8aa.
* Add back line that had been commented out as this line makes hosts specific to the particular slice when applicable
Revise private_data_dir fixture to see if it improves code coverage
Checked out awx/main/tests/unit/models/test_jobs.py in devel to see if it resolves git diff issue
* Fix formatting in awx/main/tests/unit/models/test_jobs.py
Rename for loop from host in hosts to hosts in hosts_cahced and remove unneeded continue
Revise finish_fact_cache to utilize inventory rather than hosts
Remove local var hosts that was assigned but unused
Revert change in start_fact_cache hosts_cached back to hosts
Revise the way we are handling hosts_cached and joining the file
Revert "Revise the way we are handling hosts_cached and joining the file"
This reverts commit e6e3d2f09c1b79a9bce3647a72e3dad97fe0aed8.
Reapply "Revise the way we are handling hosts_cached and joining the file"
This reverts commit a42b7ae69133fee24d3a5f1b456d9c343d111df9.
Revert some of my changes to get back to a better working state
Rename for loop to host in hosts_cached and remove unneeded continue
Remove jobs job.get_hosts_for_fact_cache() from post run hook, fix if statement after continue block, and revise how we are calling hosts in finish for loop
Add test_invalid_host_facts to test_jobs to increase code coverage
Update method signature to use hosts_cached and updated other references to hosts in finish_facts_cached
Rename hosts iterator to hosts_cached to agree with naming elsewhere
Revise test_invalid_host_facts to get more code coverage
Revise test_invalid_host_facts to increase codecov
Revise test_pre_post_run_hook_facts_deleted_sliced to ensure we are hitting the assertionerror for code cov
Revise mock_inventory.hosts. to hit assert failure
Add revision of hosts and facts to force failure to satisfy code cov
Fix failure in test_pre_post_run_hook_facts_deleted_sliced
Add back for loop to create failures and add assert to hit them
Remove hosts.iterator() from both start_fact_cache and finish_fact_cache
Remove unused import of Queryset to satisfy api-lint
Fix typo in docstring hasnot to has not
Move hosts_cached.append(host) to outer loop in start_fact_cache
Add class to help support cached hosts resolving host.name issue with hosts_cached
* Add live tests for ansible facts
Remove fixture needed for local work only maybe
Revert "Add class to help support cached hosts resolving host.name issue with hosts_cached"
This reverts commit 99d998cfb9960baafe887de80bd9b01af50513ec.
* Move hosts_cached.append(host) outside of try except
* Move hosts_cached.append(host) to the beginning of start_fact_cache
---------
Signed-off-by: onetti7 <davonebat@gmail.com>
Co-authored-by: onetti7 <davonebat@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
Bug Error reporting and handling in GH14575/GH12682
This targets a bug that tries to parse blank string as None for panelid
and dashboardid.
It also prints more errors reporting to the console to diagnose
reporting issues
Co-authored-by: Lila Yasin <lyasin@redhat.com>
* Prevent job pod from mounting serviceaccount token
* Add serializer validation for cg pod_spec_override
Prevent automountServiceAccountToken to be set to true and provide an error message when automountServiceAccountToken is being set to true
* Move call to django_validate_password to the correct method were the user object is available.
* Added tests for the Django password validation functionality.
* Move call to django_validate_password to the correct method were the user object is available.
* Added tests for the Django password validation functionality.
* Time has passed. Channels (4.2.0) no longer raises a deprecation
warning for this case. It used to (4.1.0).
* We do NOT serve http requests over daphne, this is the default
behavior of ProtocolTypeRouter() when the http param is NOT included
* Time has passed. Channels (4.2.0) no longer raises a deprecation
warning for this case. It used to (4.1.0).
* All is good. No code changes needed for this. We do NOT service http
requests over daphne, just websockets. We, correctly, do NOT supply
the http key so daphne does NOT service http requests.
* Support <collection_namespace>.<collection_name>.* indirect host query
to match ANY module in the <collection_namespace>.<collection_name>
* Add tests for new wildcard indirect host count
* error checking of ansible event name
* error checking of ansible event query
* feat: do not count dark hosts as updated (#15872)
* feat: do not count dark hosts as updated
* update functional tests
* Fix test flake due to host metric id enumeration (#15875)
---------
Co-authored-by: Alan Rominger <arominge@redhat.com>
* Update docs with a few more things
* update about use of PAT
* update around managing output from the script
* Fix spacing and empty line
* finish run on sentence
* update requirements with extra dep needed
* Add `opa_query_path field` for Inventory, Organization and JobTemplate models (#6850)
Add `opa_query_path` model field to Inventory, Organizatio and JobTemplate. Add migration file and expose opa_query_path field in the related API serializers.
* Gather and evaluate `opa_query_path` fields and raise violation exceptions (#6864)
gather and evaluate all opa query related to a job execution during policy evaluation phase
* Add OPA_AUTH_CUSTOM_HEADERS support (#6863)
* Extend policy input data serializers (#6890)
* Extend policy input data serializers
* Update help text for PaC related fields (#6891)
* Remove encrypted from OPA_AUTH_CUSTOMER_HEADER
Unable to encrypt a dict field
---------
Co-authored-by: Jiří Jeřábek (Jiri Jerabek) <Jerabekjirka@email.cz>
Co-authored-by: Alexander Saprykin <cutwatercore@gmail.com>
Co-authored-by: Tina Tien <98424339+tiyiprh@users.noreply.github.com>
Modifies to_container_path to accept an optional
container_root parameter.
Normally this defaults to /runner, but in K8S
environments, project updates run from the
private_data_dir, e.g. /tmp/awx_1_123abc, not
/runner.
In that situation, we just pass in private_data_dir
as the container_root.
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: TVo <thavo@redhat.com>
In OCP/K8S, projects run in the task pod's ee container. The private_data_dir is not extracted to /runner. Instead, the project update runs directly from the mounted in private_data_dir, e.g. /tmp/awx_1_abcd.
When injecting a credential that uses extra vars, we pass the private_data_dir as as the container_root, so that the correct command line argument is generated, e.g. "-e /tmp/awx_1_abcd/env/extra_var_file".
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Create a new pytest folder for live system testing with normal services (#15688)
* PoC for running dev env tests
* Replace in github actions
* Move folder to better location
* Further streamlining of new test folders
* Consolidate fixture, add writeup docs
* Use star import
* Push the wait-for-job to the conftest
Fix misused project cache identifier (#15690)
Fix project cache identifiers for new updates
Finish test and discover viable solution
Add comment on related task code
AAP-37989 Tests for exclude list with multiple jobs (#15722)
* Tests for exclude list with multiple jobs
Create test for using manual & file projects (#15754)
* Create test for using a manual project
* Chang default project factory to git, remove project files monkeypatch
* skip update of factory project
* Initial file scaffolding for feature
* Fill in galaxy and names
* Add README, describe project folders and dependencies
Add ee cleanup tests
* Adds cleanup tests to the live test.
Fix rsyslog permission error in github ubuntu tests from apparmor (#15717)
* Add test to detect rsyslog config problems
* Get dmesg output
* Disable apparmor for rsyslogd
Make awx/main/tests/live dramatically faster (#15780)
* Make awx/main/tests/live dramatically faster
* Add new setting to exclude list
* Fix rebase issues
* Did not want to backport this
* Feature indirect host counting (#15802)
* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)
* Add jq dependency
* Add file in progress
* Add license for jq
* Write test and get it passing
* Successfully test collection of `event_query.yml` data (#15761)
* Callback plugin method from cmeyers adapted to global collection list
Get tests passing
Mild rebranding
Put behind feature flag, flip true in dev
Add noqa flag
* Add missing wait_for_events
* feat: try grabbing query files from artifacts directory (#15776)
* Contract changes for the event_query collection callback plugin (#15785)
* Minor import changes to collection processing in callback plugin
* Move agreed location of event_query file
* feat: remaining schema changes for indirect host audits (#15787)
* Re-organize test file and move artifacts processing logic to callback (#15784)
* Rename the indirect host counting test file
* Combine artifacts saving logic
* Connect host audit model to jq logic via new task
* Add unit tests for indirect host counting (#15792)
* Do not get django flags from database (#15794)
* Document, implement, and test remaining indirect host audit fields (#15796)
* Document, implement, and test remaining indirect host audit fields
* Fix hashing
* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)
* Wait for all event processing to finish, add fallback task
* Add flag check to periodic task
* feat: cleanup of old indirect host audit records (#15800)
* By default, do not count indirect hosts (#15801)
* By default, do not count indirect hosts
* Fix copy paste goof
* Fix linter issue from base branch
* prevent multiple tasks from processing the same job events, prevent p… (#15805)
prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job
* Fix typos and other bugs found by Pablo review
* fix: rely on resolved_action instead of task, adapt to proposed query… (#15815)
* fix: rely on resolved_action instead of task, adapt to proposed query structure
* tests: update indirect host tests
* update remaining queries to new format
* update live test
* Remove polling loop for job finishing event processing (#15811)
* Remove polling loop for job finishing event processing
* Make awx/main/tests/live dramatically faster (#15780)
* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)
* Add jq dependency
* Add file in progress
* Add license for jq
* Write test and get it passing
* Successfully test collection of `event_query.yml` data (#15761)
* Callback plugin method from cmeyers adapted to global collection list
Get tests passing
Mild rebranding
Put behind feature flag, flip true in dev
Add noqa flag
* Add missing wait_for_events
* feat: try grabbing query files from artifacts directory (#15776)
* Contract changes for the event_query collection callback plugin (#15785)
* Minor import changes to collection processing in callback plugin
* Move agreed location of event_query file
* feat: remaining schema changes for indirect host audits (#15787)
* Re-organize test file and move artifacts processing logic to callback (#15784)
* Rename the indirect host counting test file
* Combine artifacts saving logic
* Connect host audit model to jq logic via new task
* Document, implement, and test remaining indirect host audit fields (#15796)
* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)
* Wait for all event processing to finish, add fallback task
* Add flag check to periodic task
* feat: cleanup of old indirect host audit records (#15800)
* prevent multiple tasks from processing the same job events, prevent p… (#15805)
prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job
* Remove polling loop for job finishing event processing (#15811)
* Make awx/main/tests/live dramatically faster (#15780)
* reorder migrations to allow indirect instances backport
* cleanup for rebase and merge into devel
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
Co-authored-by: Peter Braun <pbranu@redhat.com>
* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)
* Add jq dependency
* Add file in progress
* Add license for jq
* Write test and get it passing
* Successfully test collection of `event_query.yml` data (#15761)
* Callback plugin method from cmeyers adapted to global collection list
Get tests passing
Mild rebranding
Put behind feature flag, flip true in dev
Add noqa flag
* Add missing wait_for_events
* feat: try grabbing query files from artifacts directory (#15776)
* Contract changes for the event_query collection callback plugin (#15785)
* Minor import changes to collection processing in callback plugin
* Move agreed location of event_query file
* feat: remaining schema changes for indirect host audits (#15787)
* Re-organize test file and move artifacts processing logic to callback (#15784)
* Rename the indirect host counting test file
* Combine artifacts saving logic
* Connect host audit model to jq logic via new task
* Add unit tests for indirect host counting (#15792)
* Do not get django flags from database (#15794)
* Document, implement, and test remaining indirect host audit fields (#15796)
* Document, implement, and test remaining indirect host audit fields
* Fix hashing
* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)
* Wait for all event processing to finish, add fallback task
* Add flag check to periodic task
* feat: cleanup of old indirect host audit records (#15800)
* By default, do not count indirect hosts (#15801)
* By default, do not count indirect hosts
* Fix copy paste goof
* Fix linter issue from base branch
* prevent multiple tasks from processing the same job events, prevent p… (#15805)
prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job
* Fix typos and other bugs found by Pablo review
* fix: rely on resolved_action instead of task, adapt to proposed query… (#15815)
* fix: rely on resolved_action instead of task, adapt to proposed query structure
* tests: update indirect host tests
* update remaining queries to new format
* update live test
* Remove polling loop for job finishing event processing (#15811)
* Remove polling loop for job finishing event processing
* Make awx/main/tests/live dramatically faster (#15780)
* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)
* Add jq dependency
* Add file in progress
* Add license for jq
* Write test and get it passing
* Successfully test collection of `event_query.yml` data (#15761)
* Callback plugin method from cmeyers adapted to global collection list
Get tests passing
Mild rebranding
Put behind feature flag, flip true in dev
Add noqa flag
* Add missing wait_for_events
* feat: try grabbing query files from artifacts directory (#15776)
* Contract changes for the event_query collection callback plugin (#15785)
* Minor import changes to collection processing in callback plugin
* Move agreed location of event_query file
* feat: remaining schema changes for indirect host audits (#15787)
* Re-organize test file and move artifacts processing logic to callback (#15784)
* Rename the indirect host counting test file
* Combine artifacts saving logic
* Connect host audit model to jq logic via new task
* Document, implement, and test remaining indirect host audit fields (#15796)
* Document, implement, and test remaining indirect host audit fields
* Fix hashing
* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)
* Wait for all event processing to finish, add fallback task
* Add flag check to periodic task
* feat: cleanup of old indirect host audit records (#15800)
* prevent multiple tasks from processing the same job events, prevent p… (#15805)
prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job
* Remove polling loop for job finishing event processing (#15811)
* Remove polling loop for job finishing event processing
* Make awx/main/tests/live dramatically faster (#15780)
* temp
* remove test
* reorder migrations to allow indirect instances backport
* cleanup for rebase and merge into devel
---------
Co-authored-by: Peter Braun <pbraun@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
Co-authored-by: Peter Braun <pbranu@redhat.com>
* Add pygithub for new app token support
* fixed git requirements file with new
* added new github dep and relevant deps it needs
* add required licenses
* Add artifacts to satisfy license check
* Remove duplicated license
---------
Co-authored-by: Andrea Restle-Lay <arestlel@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* feat: 38589 GitHub App Authentication
Allows both git@<personal-token> and x-access-token@<github-access-token> when authenticating using git.
This allows GitHub App tokens to work without interfering with existing authentication types.
---------
Co-authored-by: Jake Jackson <thedoubl3j@Jakes-MacBook-Pro.local>
* Publish image base on git repo name instead of hard coded to AWX (#15828)
* Fix git credential for devel_image build (#15834)
* Continue if pre-warm cache fail in container build (#15835)
* Use correct devel image for docker-compose (#15836)
* [AAP-39138] - Add DAB Feature Flag common API (#15786)
* Update django-ansible-base reference to ansible-automation-platform/django-ansible-base@stable-2.5
---------
Co-authored-by: Zack Kayyali <zkayyali@redhat.com>
Fixes an issue where schedules were not running at
the correct time.
Details:
DST is Daylights Saving Time
If the rrule dtstart is "in" a DST period (i.e., March to November)
and the current date is outside of the DST, then the fast forwarding
is not correct.
This is because datetime timedeltas do not honor DST boundaries
The Fix:
Convert the rrule dtstart to UTC before doing operations. Then,
convert back to the original timezone at the end.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Fixes an issue where schedules were not running at
the correct time.
Details:
DST is Daylights Saving Time
If the rrule dtstart is "in" a DST period (i.e., March to November)
and the current date is outside of the DST, then the fast forwarding
is not correct.
This is because datetime timedeltas do not honor DST boundaries
The Fix:
Convert the rrule dtstart to UTC before doing operations. Then,
convert back to the original timezone at the end.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* fix backend attribute error
* managedcredential may now contain 2 different classes
* managedcredentialType and one that represents a lookup plugin
* conditionalize creation params
* added a conditional statement to filter our external types
* all external credentials are managed by awx/aap
Adds fields client_id and client_secret which
will result in authentication via service account
on console.redhat.com
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Create test for using a manual project
* Chang default project factory to git, remove project files monkeypatch
* skip update of factory project
* Initial file scaffolding for feature
* Fill in galaxy and names
* Add README, describe project folders and dependencies
* - add new entry points
- add logic to check what version of the project is running
* remove former discovery method
* update custom_injectors and remove unused import
* fix how we load external creds
* remove stale code to match devel
* fix cloudforms test and move credential loading
* add load credentials method to get tests passing
* Conditionalize integration tests if the cred is present
* remove inventory source test
* inventory source is covered in the workflow job template target
* Update dependencies to fix offline build
* Downgrade cryptography due to compatibility issue with openssl
* Downgrade setuptools
* Run update script to assure constraints work
* Maintain pin on cryptography
* Small adjustment to comment
---------
Co-authored-by: Satoe Imaishi <simaishi@redhat.com>
* Fix incorrectly passed keywords with exclude-strings arg to ansible-runner worker cleanup command
* Keep the quotes for each arg and adjust test_receptor
---------
Signed-off-by: Sasa Jovicic <jovicic.sasa@hotmail.com>
Co-authored-by: Sasa Jovicic <jovicic.sasa@hotmail.com>
* General upgrade of dependencies
* adjust licenses to match requirements
* add missing licenses
* another pass to fix licenses
* Try easy for for psycopg encoding pattern change
---------
Co-authored-by: jessicamack <jmack@redhat.com>
* Opened PR to test AAP-36522.
* Test pin specific ansible-core version in the run_awx_devel actions.
* Fixed syntax for ansible-core version.
* updated makefile to match syntax for ansible version
* Reverts makefile changes; fixed syntax for upgrade ansible-core action.
* Fix incorrectly passed keywords with exclude-strings arg to ansible-runner worker cleanup command
Signed-off-by: Sasa Jovicic <jovicic.sasa@hotmail.com>
* Keep the quotes for each arg and adjust test_receptor
---------
Signed-off-by: Sasa Jovicic <jovicic.sasa@hotmail.com>
* host_metrics date fix
* AAP-36839 Remove excess comments
* fix extra date() conversion
* actual fix
* datetime is a library, use datetime.datetime
---------
Co-authored-by: Andrea Restle-Lay <arestlel@arestlel-thinkpadx1carbongen9.rht.csb>
* host_metrics date fix
* AAP-36839 Remove excess comments
* fix extra date() conversion
* actual fix
* datetime is a library, use datetime.datetime
---------
Co-authored-by: Andrea Restle-Lay <arestlel@arestlel-thinkpadx1carbongen9.rht.csb>
* pull the correct collection plugin for the product
* remove unused import and logging line
* refactor code to load entry points
* reformat method
* lint fix
* renames for clarity and a lint fix
* move function to utils
* move the rest of the code into load_inventory_plugins
* temp - confirm that tests will pass
* revert change caught in merge
* change back requirement
the related PR has been merged
Fixes a bug where a schedule that was created
to run only once will continue to run repeatedly.
e.g. an rrule with
dtstart 20240730; count 1; freq MINUTELY
This job will run on 20240730, and should never
run again.
However, the next time the schedule
update_computed_fields runs, the dtstart
will fast forward to today's date, and
next_run will be computed from that. This will trigger
the job to run again, which is not intended.
If count is set, we just should not fast forward the
rrule and always calculate next_run based on original
dtstart.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Fixes a bug where a schedule that was created
to run only once will continue to run repeatedly.
e.g. an rrule with
dtstart 20240730; count 1; freq MINUTELY
This job will run on 20240730, and should never
run again.
However, the next time the schedule
update_computed_fields runs, the dtstart
will fast forward to today's date, and
next_run will be computed from that. This will trigger
the job to run again, which is not intended.
If count is set, we just should not fast forward the
rrule and always calculate next_run based on original
dtstart.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
the primary fix is to simply add an exception class
to those caught in the except block
This also adds live tests for the general scenario
although this does not hit the new exception type
* Unit tests do not create CredentialType records for Credential
plugins. Instead, they explicitly instantiate CredentialType(s) for
Credential plugins. They rely on CredentialType.defaults[key] to do
so. This change makes sure custom_injectors get bolted onto the
created CredentialType.
Fix error creating partition due to uncaught exception
the primary fix is to simply add an exception class
to those caught in the except block
This also adds live tests for the general scenario
although this does not hit the new exception type
* PoC for running dev env tests
* Replace in github actions
* Try non interactive
* Move folder to better location
* Further streamlining of new test folders
* Consolidate fixture, add writeup docs
* Use star import
* Push the wait-for-job to the conftest
* Remove oauth provider
This removes the oauth provider functionality from awx. The
oauth2_provider app and all references to it have been removed.
Migrations to delete the two tables that locally overwrote
oauth2_provider tables are included. This change does not include
migrations to delete the tables provided by the oauth2_provider app.
Also not included here are changes to awxkit, awx_collection or the ui.
* Fix linters
* Update migrations after rebase
* Update collection tests for auth changes
The changes in https://github.com/ansible/awx/pull/15554 will cause a
few collection tests to fail, depending on what the test configuration
is. This changes the tests to look for a specific warning rather than
counting the number of warnings emitted.
* Update migration
* Removed unused oauth_scopes references
---------
Co-authored-by: Mike Graves <mgraves@redhat.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
In essence, this configures Python to turn any warnings emitted in
runtime into errors[[1]]. This is the best practice that allows
reacting to future deprecation announcements that are coming from the
dependencies (direct, or transitive, or even CPython itself)[[2]].
The typical workflow looks like this:
1. If a dependency is updated an a warning is hit in tests, the
deprecated thing should be replaced with newer APIs.
2. If a dependency is transitive or we have no control over it
otherwise, the specific warning and a regex matching its message,
plus the module reference (where possible) can be added to the
list of temporary ignores in `pytest.ini`.
3. The list of temporary ignores should be reevaluated periodically,
including when dependency re-pinning in lockfile is happening.
[1]: https://docs.python.org/3/using/cmdline.html#cmdoption-W
[2]: https://pytest-with-eric.com/configuration/pytest-ignore-warnings/
* Add descriptions for plugin names
* Update serializers to display plugin and plugin description
* Add function to extract plugin name descriptions
* Add description for scm
* Conditionalize scm and file descriptions
By stable, we mean future occurrences of the rrule
should be the same before and after the fast forward
operation.
The problem before was that we were fast forwarding to
7 days ago. For some rrules, this does not retain the old
occurrences. Thus, jobs would launch at unexpected times.
This change makes sure we fast forward in increments of
the rrule INTERVAL, thus the new dtstart should be in the
occurrence list of the old rrule.
Additionally, code is updated to fast forward
EXRULE (exclusion rules) in addition to RRULE
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
* Replaced with larger graphic.
* Revert "Replaced with larger graphic."
This reverts commit 1214b00052.
* Removed UI-focused user docs from AWX.
* Fixed indentation for release notes
* Removed/updated image files no longer needed.
By stable, we mean future occurrences of the rrule
should be the same before and after the fast forward
operation.
The problem before was that we were fast forwarding to
7 days ago. For some rrules, this does not retain the old
occurrences. Thus, jobs would launch at unexpected times.
This change makes sure we fast forward in increments of
the rrule INTERVAL, thus the new dtstart should be in the
occurrence list of the old rrule.
Additionally, code is updated to fast forward
EXRULE (exclusion rules) in addition to RRULE
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <wk.cvs.github@sydorenko.org.ua>
* Add back git requirements as comments
* Add comment to commented out git lines for clarity
* Re run the updater script
* Add new licenses
* Fix library name
* AAP 2.5 Controller 4.6 Org, User, and Team endpoints are restricted.
When the user performs a restricted operation via the Controller
collection, kindly notify them that they should be using the platform
collection instead.
This is a rather hacky, but fixes the DRF pages when going through a
trusted proxy.
Notably: This is meant to primarily fix the DRF pages on downstream
builds while leaving the upstream to function as-is.
When using a trusted proxy, the DRF login and logout endpoints now
redirect to the Platform login page (which respects ?next) and logout
endpoint respectively.
The CSS and JS is inlined because the trusted proxy might only proxy
to /api/ and not /static/ which is a harder problem to solve.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Add gateway support to awxkit
This updates awxkit to add support for gateway when fetching oauth
tokens, which is used during the `login` subcommand. awxkit will first
try fetching a token from gateway and if that fails, fallback to
existing behavior. This change is backwards compatible.
Signed-off-by: Mike Graves <mgraves@redhat.com>
* Address review feedback
This:
* adds coverage for the get_oauth2_token() method
* changes AuthUrls to a TypedDict
* changes the url used for personal token access in gateway
* Address review feedback
This is just minor stylistic changes.
---------
Signed-off-by: Mike Graves <mgraves@redhat.com>
Remove RADIUS authentication from AWX
Do not remove models fields and tables let it for a stage where all the work of removing external auth finished AAP-27707
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
This is a rather hacky, but fixes the DRF pages when going through a
trusted proxy.
Notably: This is meant to primarily fix the DRF pages on downstream
builds while leaving the upstream to function as-is.
When using a trusted proxy, the DRF login and logout endpoints now
redirect to the Platform login page (which respects ?next) and logout
endpoint respectively.
The CSS and JS is inlined because the trusted proxy might only proxy
to /api/ and not /static/ which is a harder problem to solve.
Signed-off-by: Rick Elrod <rick@elrod.me>
UI_NEXT is no longer being served in 4.6
Removing static file dir for ui-next fixes
```
WARNINGS:
?: (staticfiles.W004) The directory '/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/ui_next/build' in the STATICFILES_DIRS setting does not exist.
CommandError: Inventory with id = 1038181458411569545 cannot be found
```
* Fix CI for newer debian image (#15583)
* Fix CI for newer debian image
Signed-off-by: Rick Elrod <rick@elrod.me>
* Missed one
Signed-off-by: Rick Elrod <rick@elrod.me>
---------
Signed-off-by: Rick Elrod <rick@elrod.me>
* Update ci.yml
---------
Signed-off-by: Rick Elrod <rick@elrod.me>
Co-authored-by: Rick Elrod <rick@elrod.me>
* Fix CI for newer debian image
Signed-off-by: Rick Elrod <rick@elrod.me>
* Missed one
Signed-off-by: Rick Elrod <rick@elrod.me>
---------
Signed-off-by: Rick Elrod <rick@elrod.me>
* Add `awx_plugins.interfaces` runtime dependency
* Use `awx_plugins.interfaces` for runtime detection
The original function name was `server_product_name()` but it didn't
really represent what it did. So it was renamed into
`detect_server_product_name()` in an attempt of disambiguation.
* Use `awx_plugins.interfaces` to map container path
The original function `to_container_path` has been renamed into
`get_incontainer_path()` to represent what it does better and make
the imports more obvious.
* Add license file for awx_plugins.interfaces
---------
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* There isn't a great reason to allow the UI to edit these meta-data
fields that denote the last time an analytics job ran.
* The only reason I hesitate to mark them uneditable in the API is that
they are useful to change in order to influence when the jobs run.
Mostly for debug purposes or 1-off.
* Removed files from AWX that were moved to awx-plugins.
* Removed credential plugins file from AWX.
* Resolved broken build: added back missing graphics and removed obsolete xrefs.
* There isn't a great reason to allow the UI to edit these meta-data
fields that denote the last time an analytics job ran.
* The only reason I hesitate to mark them uneditable in the API is that
they are useful to change in order to influence when the jobs run.
Mostly for debug purposes or 1-off.
Adding credential and execution environment roles
validates that the user belongs to the same org
as the credential or EE.
In some situations, the user-org membership has not
yet been synced from gateway to controller.
In this case, controller will make a request to
gateway to check if the user is part of the org.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Adding credential and execution environment roles
validates that the user belongs to the same org
as the credential or EE.
In some situations, the user-org membership has not
yet been synced from gateway to controller.
In this case, controller will make a request to
gateway to check if the user is part of the org.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
These tests are known to only be executed partially or not at all. So
we always get incomplete, missing, and sometimes flaky, coverage in
the test functions that are expected to fail.
This change updates the ``coverage.py`` config to prevent said tests
from influencing the coverage level measurement.
Ref https://github.com/pytest-dev/pytest/pull/12531
# Add a postfix to the UI URL patterns for UI URL generated by the API
# example if set to '' UI URL generated by the API for jobs would be $TOWER_URL/jobs
# example if set to 'execution' UI URL generated by the API for jobs would be $TOWER_URL/execution/jobs
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
# Add a postfix to the UI URL patterns for UI URL generated by the API
# example if set to '' UI URL generated by the API for jobs would be $TOWER_URL/jobs
# example if set to 'execution' UI URL generated by the API for jobs would be $TOWER_URL/execution/jobs
* Register all discovered CredentialType(s) after Django finishes
loading
* Protect parallel registrations using shared postgres advisory lock
* The down-side of this is that this will run when it does not need to,
adding overhead to the init process.
* Only register discovered credential types in the database IF
migrations have ran and are up-to-date.
User and Team assignments using the DAB
RBAC system will be translated back to the old
Role system.
This ensures better backward compatibility and
addresses some inconsistences in the UI that were
relying on older RBAC endpoints.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
This is to emphasize that this role is specific
to controller component. That is, not an auditor
for the entire AAP platform.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
User and Team assignments using the DAB
RBAC system will be translated back to the old
Role system.
This ensures better backward compatibility and
addresses some inconsistences in the UI that were
relying on older RBAC endpoints.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Alan Rominger <arominge@redhat.com>
* #egg _could_ be awx-plugins.some.other.provided.package
* Also point at ansible devel instead of a forked branch since the
entrypoints PR has now merged to devel
Fixes bug where creating a new user will
request a new awx_sessionid cookie, invalidating
the previous session.
Do not refresh session if updating or
creating a password for a different user.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Prevent job pod from mounting serviceaccount token
* Add serializer validation for cg pod_spec_override
Prevent automountServiceAccountToken to be set to true and provide an error message when automountServiceAccountToken is being set to true
* Fallback to use subscription cred for analytic
Fall back to use SUBSCRIPTION_USERNAME/PASSWORD to upload analytic to if REDHAT_USERNAME/PASSWORD are not set
* Improve error message
* Guard against request with no query or data
* Add test for _send_to_analytics
Focus on credentials
* Supress sonarcloud warning about password
* Add test for analytic ship
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* Fallback to use subscription cred for analytic
Fall back to use SUBSCRIPTION_USERNAME/PASSWORD to upload analytic to if REDHAT_USERNAME/PASSWORD are not set
* Improve error message
* Guard against request with no query or data
* Add test for _send_to_analytics
Focus on credentials
* Supress sonarcloud warning about password
* Add test for analytic ship
This is to emphasize that this role is specific
to controller component. That is, not an auditor
for the entire AAP platform.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Adds the following managed Role Definitions
Controller Team Admin
Controller Team Member
Controller Organization Admin
Controller Organization Member
These have the same permission set as the
platform roles (without the Controller prefix)
Adding members to teams and orgs via the legacy RBAC system
will use these role definitions.
Other changes:
- Bump DAB to 2024.08.22
- Set ALLOW_LOCAL_ASSIGNING_JWT_ROLES to False in defaults.py.
This setting prevents assignments to the platform roles (e.g. Team Member).
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Rewrite more access logic in terms of permissions instead of roles
* Cut down supported logic because that would not work anyway
* Remove methods not needed anymore
* Create managed roles in test before delegating permissions
Adds the following managed Role Definitions
Controller Team Admin
Controller Team Member
Controller Organization Admin
Controller Organization Member
These have the same permission set as the
platform roles (without the Controller prefix)
Adding members to teams and orgs via the legacy RBAC system
will use these role definitions.
Other changes:
- Bump DAB to 2024.08.22
- Set ALLOW_LOCAL_ASSIGNING_JWT_ROLES to False in defaults.py.
This setting prevents assignments to the platform roles (e.g. Team Member).
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* replace ansiconv with ansi2html
The ansiconv package is archived so I'm replacing it with a similar package that's still actively being worked on.
* remove minimum version
The version minimum was used to get the latest version while running the upgrader
* set minimum version for ansi2html
* provide usage info
* Rewrite more access logic in terms of permissions instead of roles
* Cut down supported logic because that would not work anyway
* Remove methods not needed anymore
* Create managed roles in test before delegating permissions
I had the luck of running into this race condition that broke my deployment. No instance was ever able to register because on running "awx-manage" in some check of a setting, it would end up failing here with
```
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/conf/license.py", line 10, in _get_validated_license_data
return get_licenser().validate()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/utils/licensing.py", line 453, in validate
automated_since = int(Instance.objects.order_by('id').first().created.timestamp())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'created'
```
I had the luck of running into this race condition that broke my deployment. No instance was ever able to register because on running "awx-manage" in some check of a setting, it would end up failing here with
```
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/conf/license.py", line 10, in _get_validated_license_data
return get_licenser().validate()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/utils/licensing.py", line 453, in validate
automated_since = int(Instance.objects.order_by('id').first().created.timestamp())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'created'
```
unpin django-guid and update license
there's no reason listed for the pin and the changelog doesn't describe any changes that should block a full upgrade. they changed licenses to MIT
* unpin django-split-settings
blocker is 2 years old. upgrading to see if the previous issue is still present. upgrading to a version with Python 3.11 support
* remove UPGRADE BLOCKER in README
* unpin channels-redis
The bug that initially caused the upgrade block has been resolved https://github.com/django/channels_redis/issues/332
* replace aioredis Exception with a redis Exception
Version 4.0.0 of channel-redis migrated the underlying Redis library from aioredis to redis-py. The Exception has been changed to an equivalent
* remove unused license
* remove UPGRADE BLOCKER in README
* remove hiredis
it was an indirect dependency from aioredis which was removed
* remove unused license
* add back hiredis
it's potentially providing a performance boost. install explicitly as a part of redis. upgrade to more recent version
* remove UPGRADE BLOCKER for hiredis
it was also addressed as a part of this PR
```
/var/lib/awx/venv/awx/lib64/python3.11/site-packages/_pytest/python.py:163:
PytestReturnNotNoneWarning: Expected None, but
awx/main/tests/unit/test_tasks.py::TestJobCredentials::test_custom_environment_injectors_with_boolean_extra_vars
returned ['successful', 0], which will be an error in a future version
of pytest. Did you mean to use `assert` instead of `return`?
```
* Dug into the git blame for this one
060585434a is the commit for any
historians. It was wrongfully carried over from a mock pexpect
implementation. Our new tests are nice. They don't go as far as trying
to run the task so they do not need to mock pexpect. That is why it is
safe to remove this code without finding it a new home.
Fixes bug where creating a new user will
request a new awx_sessionid cookie, invalidating
the previous session.
Do not refresh session if updating or
creating a password for a different user.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* unpin channels-redis
The bug that initially caused the upgrade block has been resolved https://github.com/django/channels_redis/issues/332
* replace aioredis Exception with a redis Exception
Version 4.0.0 of channel-redis migrated the underlying Redis library from aioredis to redis-py. The Exception has been changed to an equivalent
* remove unused license
* remove UPGRADE BLOCKER in README
* remove hiredis
it was an indirect dependency from aioredis which was removed
* remove unused license
* add back hiredis
it's potentially providing a performance boost. install explicitly as a part of redis. upgrade to more recent version
* remove UPGRADE BLOCKER for hiredis
it was also addressed as a part of this PR
Change django url dispatcher to serve up ui_next files instead of old ui files
Old UI will not be served with this change
Github CI still runs old ui tests (to be removed in another PR)
Remove the Github workflows that build old UI
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- use asyncio.get_running_loop() instead of passing around event_loops
- add name to all of the asyncio tasks for easier debugging
we are trying to figure out which task is
```
Task was destroyed but it is pending!
task: <Task pending name='Task-<id>' coro=<RedisConnection._read_data() done, defined at /var/lib/awx/venv/awx/lib64/python3.9/site-packages/aioredis/connection.py:180> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7fba77bf1700>()]> cb=[RedisConnection.__init__.<locals>.<lambda>() at /var/lib/awx/venv/awx/lib64/python3.9/site-packages/aioredis/connection.py:168]>
```
is referring to
* Replaced all references of downstream docs to upstream docs.
* Update README.md
Co-authored-by: Don Naro <dnaro@redhat.com>
* Update README.md.j2
Co-authored-by: Don Naro <dnaro@redhat.com>
* Update README.md.j2
Co-authored-by: Don Naro <dnaro@redhat.com>
* Incorpor'd review feedback from @oraNod and @samccann
* Updated with agreed link (for now) until further change is needed.
---------
Co-authored-by: Don Naro <dnaro@redhat.com>
If RECEPTOR_KEEP_WORK_ON_ERROR is set to true receptor work unit will not be automatically released
Co-Authored-By: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
Links are use to indicate network connectivity and optionally provide alias
it is not needed for communication since all the container are on the awx network
in the prometheus container case since awx_ container now have valid hostname it's no longer required (also i think the link is missing a `-` anyway...)
links also implicitly imply dependency between services in this i see awx container depends on redis and postgres so i switch to depends_on to retain that
Making this change to be podman compatible
because i get
```
Error response from daemon: bad parameter: link is not supported
```
Old RBAC system hits DOESNOTEXIST query errors
if a user deletes an org while a workflow job is active.
The error is triggered by
1. starting workflow job
2. delete the org that the workflow job is a part of
3. The workflow changes status (e.g. pending to waiting)
This error message would surface
awx.main.models.rbac.Role.DoesNotExist: Role matching
query does not exist.
The fix is wrap the query in a try catch, and skip
over some logic if the roles don't exist.
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* update terminology
Replace some instances of Tower with AWX and remove some references to
enterprise left over from the migration of RST content from the
Automation Controller docs.
* Update docs/docsite/rst/userguide/overview.rst
Co-authored-by: TVo <thavo@redhat.com>
---------
Co-authored-by: TVo <thavo@redhat.com>
* Check upstream django-ansible-base releases. If the version upstream
does not match the version we are pinned to then submit a PR with the
upstream version.
A user needs to be a member of the org
in order to use a credential in that org.
We were incorrectly checking for "change"
permission of the org, instead of "member".
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Fix command to set db session timeout
Add quote around the value of the setting
Example failures
```
2024-07-10 13:33:29,237 ERROR [a7e55a64e6744a0e920bb1fd78615e5f] awx.main.dispatch Worker failed to run task awx.main.tasks.system.awx_periodic_scheduler(*[], **{}
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
raise ex.with_traceback(None)
psycopg.errors.SyntaxError: trailing junk after numeric literal at or near "1d"
LINE 1: SET idle_in_transaction_session_timeout = 1d
^
```
Timestamp for activity stream is not indexed result in slow query. switching to ID (which effectively will is order by created time) to improve performance
Validate role assignment if org defined
Check that organization is defined on credential
before running queries.
Fixes a "None type does not have attribute id" error.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Add test that we got all permissions right for every role
* Fix missing Org execute role and missing adhoc role permission
* Add in missing Organization Approval Role as well
* Remove Role from role names
Utilizes the `validate_role_assignment` callback
from dab (see dab PR #490) to prevent granting credential
access to a user of another organization.
This logic will work for role_user_assignments
and role_team_assignments endpoints.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Added new OpenShift Virtualization inventory source to docs.
* Incorporated review feedback from @fosterseth and @TheRealHaoLiu.
* Fixed link to correct kubevirt.core.kubevirt documentation.
* Add better 403 error message for Job template create
To create Job template u need access to projects and inventory
---------
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
* Add initial test for deletion of stale permission
* Delete existing EE view permission
* Hypothetically complete update of EE model permissions setup
* Tests passing locally
* Issue with user_capabilities was a test bug, fixed
* Add TASK_MANAGER_LOCK_TIMEOUT
`TASK_MANAGER_LOCK_TIMEOUT` controls the `idle_in_transaction_session_timeout` and `idle_session_timeout` configuration for task manager connections and lock in database
hope to prevent the situation that the task instance that holds the lock becomes unresponsive and preventing other instance to be able to run task manager
* Add session timeout to periodic scheduler and all sub task manager locks
Workaround
```
ERROR awx/main/tests/functional/test_licenses.py - pip._vendor.distlib.DistlibException: Unable to locate finder for 'pip._vendor.distlib'
```
* Add migration testing for certain managed roles
* Fix managed role bugs
* Add more tests
* Fix another bug with org workflow admin role reference
* Add test because another issue is fixed
* Mark reason for test
* Remove internal markers
* Reword failure message
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Script was falsely identifying cross-linked
parents. It needs to check if parent roles if
content type is Team and role_field is
member_role OR admin_role.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Include a bit of context into the name of the delete function. The
HTTP_ added prepended string may be unexpected if Django's header
transformation isn't top of mind.
rename AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED
to
ALLOW_LOCAL_RESOURCE_MANAGEMENT
- clearer meaning
- drop prefix so the same setting is used across the platform
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
This is actually happening for one customer, though it seems like it
shouldn't be if the foreign key constraint is set back up properly.
In order to recreate it, I had to add the constraint back with 'NOT
VALID' added on to prevent the check.
* Periodically sync from share resource provider
- add periodic task `periodic_resource_sync` run once every 15 min
- if `RESOURCE_SERVER` is not configured sync will not run
- only 1 node
example RESOURCE_SERVER configuration
```
RESOURCE_SERVER = {
"URL": "<resource server url>",
"SECRET_KEY": "<resource server auth token>",
"VALIDATE_HTTPS": <True/False>,
}
RESOURCE_SERVICE_PATH = <resource_service_path>
```
If more than one schedule for a unified job template
is removed at once, a race condition can arise.
example scenario: delete schedules with ids 7 and 8
- unified job template next_schedule is currently 7
- on delete of schedule 7, update_computed_fields will try to set
next_schedule to 8
- but while this logic is occurring, another transaction
is deleting 8
This leads to a db IntegrityError
The solution here is to call select_for_update() on the
next schedule, so that 8 cannot be deleted until
the transaction for deleting 7 is completed.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
It looks like we can't do upserts currently without dropping to raw
SQL, but if we wrap each batch in a transaction, that should insure
that each is updated with the correct count.
PG_TLS=true make docker-compose
This will add some extra startup commands
for the postgres container to generate a key and
cert to use for postgres connections.
It will also mount in pgssl.conf which has ssl configuration.
This can be useful for debugging issues that only surface
when using ssl postgres connections.
* Prevent modifying shared resources
Adds a class decorator to prevent modifying shared resources
when gateway is being used.
AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED is the setting
to enable/disable this feature.
Works by overriding these view methods:
- create
- delete
- perform_update
create and delete are overridden to raise a
PermissionDenied exception.
perform_update is overridden to check if any shared
fields are being modified, and raise a PermissionDenied
exception if so.
Additional changes:
Prevent sso conf from registering external authentication related settings if
AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED is False
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Support for AWS SNS notifications. SNS is a widespread service that is used to integrate with other AWS services(EG lambdas). This support would unlock use cases like triggering lambda functions, especially when AWX is deployed on EKS.
Decisions:
Data Structure
- I preferred using the same structure as Webhook for message body data because it contains all job details. For now, I directly linked to Webhook to avoid duplication, but I am open to suggestions.
AWS authentication
- To support non-AWS native environments, I added configuration options for AWS secret key, ID, and session tokens. When entered, these values are supplied to the underlining boto3 SNS client. If not entered, it falls back to the default authentication chain to support the native AWS environment. Properly configured EKS pods are created with temporary credentials that the default authentication chain can pick automatically.
---------
Signed-off-by: Ethem Cem Ozkan <ethemcem.ozkan@gmail.com>
* Always output awx logs to a file via otel
* That log file can always be later replayed into a product that
supports otlp at a later date.
* Useful when you find a problem that you need a time series DB to help
find and solve.
* Useful if a community member or customer has a problem where a time
series db would be helpful. You can take a "remote" users log and
replay it locally for analysis.
* Otherwise, settings value changes bleeds over into other tests.
* Remove django.conf settings import so that we do not accidentally
forget to use the settings fixture.
- switch to galaxy search API for determining if the version we want to publish already exist
- switch from github action variable to env var for easier copy and paste testing
We have not identify the root cause of wsrelay failure but attempt to make wsrelay restart itself resulted in postgres and redis connection leak. We were not able to fully identify where the redis connection leak comes from so reverting back to failing and removing startsecs 30 will prevent wsrelay to FATAL
```
ERRO[0000] path "/var/lib/awx/.config" exists and it is not owned by the current user
```
start to happen with podman 5
it seems that the config files are no longer needed removing it fixes the problem
skip update parent logic for 'waiting' on UnifiedJob
by not looking up "status_before" from previous instance
we save 2 to 3 expensive calls (the self lookup of old state, the lookup
of parent, and the update to parent if allow_simultaneous == False or status == 'waiting')
This change makes "wait: true" for jobs and syncs
look at the event_processing_finished instead of
finished field.
Right now there is a race condition where
a module might try to delete an inventory, but the events
for an inventory sync have not yet finished. We have a
RelatedJobsPreventDeleteMixin that checks for this condition.
bulk jobs don't have event_processing_finished so we just
use finished field in that case.
openssl 3.2.0 has incompatiblity issues with
the libpq version we are using, and causes
some C runtime errors:
"double free or corruption (out)"
see awx issue #15136
also this issue
github.com/conan-io/conan-center-index/pull/22615
once the libpq libraries on centos stream9 are
updated with the patch, we can unpin openssl
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If we don't have something in the cache when we call
get_by_natural_key, do an actual filtered query for it and cache the
results. We'll get more overall API calls this way, but they'll be
smaller and will happen while we are importing, not upfront.
* Adding CSRF Validation for schemas
* Changing retrieve of scheme to avoid importing new library
* check if CSRF_TRUSTED_ORIGINS exists before accessing it
---------
Signed-off-by: Bruno Sanchez <brsanche@redhat.com>
Currently the association box displays a
list of available instances/addresses that can
be peered to.
The pagination issue arises as follows:
- fetch 5 instances (based on page_size)
- filter these instances down based on some
criteria (like is_internal: false)
- show results
Filtering down the results inside of the fetch
method results in pagnation errors (may return fewer than
5, for example)
instead, do the filtering by API queries. That way the
pagination count will be correct.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
It's the year 2024: using -k as default in https URL schemes should be deprecated. (I have left one mention of it possibly being required if no CA available). Furtheremore, neither -XGET or -XPOST are required, as curl(1) well knows when to use which method.
- when re-establishing connection to db close old connection
- re-initialize WebSocketRelayManager when restarting asyncio.run
- log and ignore error in cleanup_offline_host (this might come back to bite us)
- cleanup connection when WebSocketRelayManager crash
* Fix bug where team could not be given read_role to other team
* Avoid unwanted triggers of parentage granting
* Restructure signal structure
* Fix another bug unmasked by team member permission fix
* Changes to live with test writing
* Use equality as opposed to string "in"
from Seth in review comment
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Adds new modules for CRUD operations on the
following endpoints:
- api/v2/role_definitions
- api/v2/role_user_assignments
- api/v2/role_team_assignments
Note: assignment is Create or Delete only
Additional changes:
- Currently DAB endpoints do not have "type"
field on the resource list items. So this modifies
the create_or_update_if_needed to allow manually
specifying item type.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Add new enablement settings from DAB RBAC
* Initial implementation of system auditor as role without testing
* Fix system auditor role, remove duplicate assignments
* Make the system auditor role managed
* Flake8 fix
* Remove another thing from old solution
* Fix a few test failures
* Add extra setting to disable custom system roles via API
* Add test for custom role prohibition
Develop ability to list permissions for existing roles
Create a model registry for RBAC-tracked models
Write the data migration logic for creating
the preloaded role definitions
Write migration to migrate old Role into ObjectRole model
This loops over the old Role model, knowing it is unique
on object and role_field
Most of the logic is concerned with identifying the
needed permissions, and then corresponding role definition
As needed, object roles are created and users then teams
are assigned
Write re-computation of cache logic for teams
and then for object role permissions
Migrate new RBAC internals to ansible_base
Migrate tests to ansible_base
Implement solution for visible_roles
Expose URLs for DAB RBAC
* Before, the optional url prefix feature required calling our
versioning version of reverse(). This worked _ok_ until we added more
and more urls from 3rd party apps. Those 3rd party apps do not call
our reverse(), writefully so.
* This implementation looks at the incoming request path. If it includes
the special optional prefix url, then we register ALL the urls WITH
the optional url prefix.
If the incoming request path does NOT contain the options url prefix
then we register ALL the urls WITHOUT the optional url prefix.
* Before this, we were registering BOTH sets of urls and then reverse()
+ the request as context to decide which url.
* Middleware classes can be instantiated multiple times in testing. To
make this a non-issue, move the init code for named urls out of the
middleware init and into the app init.
* This makes it easier to use other testing facilities, like
LiveServerTestCase, without having to mock the named url middleware
init.
The promote workflow recently failed. Since this was just a problem with our automation, it would be nice if we didn't have to do another release just to fix our tooling.
* Stage multi-arch awx image
- change CI to use `make awx-kube-build` instead of build playbook
- update staging CI to build and push multiarch awx image
- update doc to use `make awx-kube-build` to build awx image
- remove build playbook (no longer used)
* `drf_reverse()` was introduced here 1a75b1836e
* There is a comment about monkey patching. I can't find the monkey patch it is referencing.
* AWX `drf_reverse()` is a copy paste of this https://github.com/encode/django-rest-framework/blob/master/rest_framework/reverse.py#L32
* The only difference is DRF's version calls `preserve_builtin_query_params()`
* `preserve_builtin_query_params()` only does something if `api_settings.URL_FORMAT_OVERRIDE` is defined.
* We don't use `REST_FRAMEWORK.URL_FORMAT_OVERRIDE`
* Added docs for terraform credential/inventory source
* Updated screen captures for inventories and source to match wfjt example
* Added docs for terraform credential/inventory source
* Updated screen captures for inventories and source to match wfjt example
* Update docs/docsite/rst/userguide/inventories.rst
Co-authored-by: Aoki <lucasaoki@users.noreply.github.com>
* Revised per review feedback.
* Update docs/docsite/rst/userguide/inventories.rst
Co-authored-by: Helen Bailey <hakbailey@gmail.com>
---------
Co-authored-by: Aoki <lucasaoki@users.noreply.github.com>
Co-authored-by: Helen Bailey <hakbailey@gmail.com>
* Adjust the awx-manage script to make use of importlib
removing the deprecation warning.
* Synlink awx-manage in docker-compose
No longer need to rebuild docker-compose devel image to load change for `tools/docker-compose/awx-manage` in development environment
---------
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* Update DOCKER_COMPOSE command
docker-compose will stop being supported soon and this is causing CI flake setting DOCKER_COMPOSE default to `docker compose`
* Give AWX network a static name
* We didn't really make use of json formatting across the app. Remove
the special case json formatter. Instead, output all of the meta-data
associated with a job lifecycle event every time. Before, we tried to
only output this extra meta data when in DEBUG mode. It turns out this
information is smaller than we thought and more useful than we thought
so always output it.
* Previously, the params were passed without quotes and each directory
was being interpreted as a seperate command line flag.
* Added some structure around the error messages returned from
receptorctl so we can more easily decide how to handle each case. For
example, releasing the cleanup job from receptor doesn't absolutely
need to succeed because we have a periodic job that does that. In
fact, that is the thing that is making it fail .. but I digress.
Recent changes in awx and/or django ansible base cause the django
collectstatic command to fail when using an empty settings file.
Instead, use the defaults settings file from controller via
DJANGO_SETTINGS_MODULE=awx.settings.defaults
[linux/amd64 builder 13/13] RUN AWX_SETTINGS_FILE=/dev/null
SKIP_SECRET_KEY_CHECK=yes SKIP_PG_VERSION_CHECK=yes
/var/lib/awx/venv/awx/bin/awx-manage collectstatic --noinput --clear
Traceback (most recent call last):
(...)
django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly
configured. Please supply the ENGINE value. Check settings documentation for
more details.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Fix survey prompt presentation inconsistencies
Remove unnecessary conditional
This conditional always returned true. See the following warning: This condition will always return 'true' since JavaScript compares objects by reference, not value.
Fix schedule edit tests
Modification to settings
- Add hidden to indicate to UI_NEXT to hide the field
- Add warning_text to indicate to UI_NEXT to display the warning when specific setting is modified
- Address some non required field being marked as required
* Add setting for configuring optional URL prefix for /api
Add OPTIONAL_API_URLPATTERN_PREFIX setting
examples:
- if set to `''` (empty string) API pattern will be `/api`
- if set to 'controller' API pattern will be `/api` AND `/api/controller`
* Add dump_auth_config management cmd
- Dump SAML config from AWX to DAB authenticator config in json format
* Add dumping of LDAP settings
* add test for command
* Fix is_enabled
* fix command name typo
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* add fields to config, add name to data
* break out IDP values
* change test fields and value comparison
* edit help text, reformat settings
---------
Co-authored-by: jessicamack <jmack@redhat.com>
https://github.com/ansible/awx/pull/14910/files
introduced a bug where we no longer accept the right exceptions
when 2 job launch at the sametime and try to create jobevent table partition 1 of the job will fail
* Fixed mismatch between setuptools version in the makefile and requirements file
* Fix mismatch of versions in makefile and requirements
* Added maturin license
Prune dangle image periodically
pairs with https://github.com/ansible/ansible-runner/pull/1342
this fix the problem of us forcefully remove images when setting changing ee image that's being used in a job causing the job to fail
* Align Orign and Host header
* Before this change the Host: header was runserver. Seems to be set by
nginx upstream flow.
* After this change we explicitly set the Host: header
* More about CSRF checks ...
CSRF checks that Origin == Host. Think about how the browser works.
<browser goes to awx.com>
"I'm executing javascript that I downloaded from awx.com (ORIGIN) and
I'm making an XHR POST request to awx.com (HOST)"
Server verifies; Host: header == Origin: header; OK!
vs. the malicious case.
<hacker injects javascript code into google.com>
<browser goes to google.com>
"I'm executing javascript that I downloaded from google.com (ORIGIN)
and I'm making an XHR POST request to awx.com (HOST)"
Server verifies; Host: header != Origin: header; NOT OK!
* Update awx/settings/development.py
---------
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Enable VSCode debugger integration when attaching VSCode to with AWX docker-compose development environment container
- add debugpy launch target in `.vscode/launch.json` to enable launching awx processes with debugpy
- add vscode tasks in `.vscode/tasks.json` to facilitate shutting down corresponding supervisord managed processes while launching process with debugpy
- modify nginx conf to add django runserver as fallback to uwsgi (enable launching API server via debugpy)
* Credential Lookup with multiple types
Allow looking up a credential with one of multiple type IDs.
* Allow Azure cred for SCM
Allow selecting an Azure Resource Manager credential for Git-based SCMs.
This is in order to enable using Azure Service Principals for project updates.
* Implement Azure Service Principal Git
This adds support for using an Azure Service Principal for project updates.
---------
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org>
Add pip>=21.3 to dev requirement required for installing django-ansible-base in editable mode
https://peps.python.org/pep-0660/
PEP 660 – Editable installs for pyproject.toml based builds (wheel based)
Not auto-reload explicitly STOPPED processes
In development/debug workflow sometime we explicitly STOP processes this will make sure auto-reload does not start them back up
* add resources api to controller
* update setting
models are not the source of truth in AWX
* Force creation of ServiceID object in tests
* fix typo
* settings fix for CI
---------
Co-authored-by: Alan Rominger <arominge@redhat.com>
Tried to dig as to why we ever needed this and could not find the answer. We removed it and ran all the tests and the tests passed so assuming it's no longer needed.
`pytest awx/main/tests/docs --release=$(VERSION_TARGET)`
where --release is required breaks test discovery and running in vscode (from within the container)
* No harm in adding it to the list. If a JWT auth header is provided,
then process it (valid or not). If a JWT is not provided, move on to
the next auth.
Fix deadlock scenario where dispatcher child process stuck in reading from queue loop after dispatcher parent process decided to quit
Co-authored-by: Alan Rominger <arominge@redhat.com>
- avoid looping
- avoid using multiple files, only one should be provided and processed per type
- use first_found and variables to locate existing file
- skip if no file exists
As we do for control nodes, disable the
install_bundle endpoint for ingress nodes.
This can be done by checking if instance managed
is True.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* authorized/not authorized tests for wsrelay endpoint
* not authorized test for web browser websockets
* skeleton of a test for authorized web browser websockets
The github workflow that we have set up for branch deletion doesn't work:
- the `on: delete` event does not support the `branches:` filter
- the `mode` flag for the aws_s3 module does not have `delete` as one
of the options. The proper option appears to be `delobj`.
* Channels doesn't really give you an interface to support per-endpoint
auth ... so this adds one.
* The web browser and node <--> node communication have different auth
needs.
* Previously, the nginx location would match on /foo/websocket... or
/foo/api/websocket... Now, we require these two paths to start at the
root i.e. <host>/websocket/... /api/websocket/...
* Note: We now also require an ending / and do NOT support
<host>/websocket_foobar but DO support <host>/websocket/foobar. This
was always the intended behavior. We want to keep
<host>/api/websocket/... "open" and routing to daphne in case we want
to add more websocket urls in the future.
* Always support cookies, session, and also allow rest_framework
configured auth methods over the browser websocket.
* The node -> node websocket auth remains locked down and unchanged
* Add a dev option for updater script to pin CI reqs
* Avoid removing git links for dev requirements
* Add dev to primary options
* Fix up sanitize git switch
This is a non-functional change. The way os_info is populated with docker info
and grep 'Operating System' breaks on podman and likely in other places. This
makes it work on both podman and docker, and it will continue to return the
exact same strings everywhere else.
Our migrations that touch roles tend to bring in our real models via
migration_utils.set_current_apps_for_migrations, and that can have
some undesirable side-effects.
* add ldap_auth mount and configure it
* added in key engines, userpass auth method, still needs testing
* add policies and fix ldap_user
* start awx automation for vault demo and move ldap
* update docs with new flags/new credentials
* Added LDAP support for HashiCorp Vault lookup credential
* Added LDAP support for HashiCorp Vault lookup credential
* Replaced graphics and updated missing fields.
* Added LDAP support for HashiCorp Vault lookup credential
* Replaced graphics and updated missing fields.
* Incorporated review feedback from @thedoubl3j and @djyasin.
* Fix UI peers_from_control_nodes
Fixes bug where peers_from_control_node was
greyed out in UI.
Additional changes:
- Make edit instance button only show for instances
with managed = False
- Make remove instance button only show for instances
with managed = False
- InstanceList selectable only for instances with
managed = False
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Organize metrics into their respective service
* Server per-service metrics on a per-service http server
* Increase prometheus client usage over our custom metrics fields
Listener Addresses is a better name to
emphasize these are routable addresses to
reach a listener service on the node.
Also removed expand toggle on the listener
addresses list items, as the expanded mode
had no additional information.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Make protocol be blank on instance if there
is no canonical address for this instance.
It was defaulting to "tcp" before.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
In receptor address post-save method:
- Fixed detecting if address was missing
a link from control nodes
- Use InstanceLink create_or_update to prevent
adding duplicate InstanceLink source and target
peers
In instance serializer create_or_update,
delete receptor addresses first before doing
instance create or update. This ensures that we don't
trigger unnecessary post-save methods that might
attempt to manipulate receptor addresses that
will just be removed later.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
test_listener_port
test_peers_from_control_nodes
test_peers_from_control_nodes_without_listener_port
are covered in the following tests:
test_no_op
test_creates_canonical_address
test_deletes_canonical_address
test_updates_canonical_address
test_canonical_address_validation_error
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add functional test case for inspecting
established receptor connections.
InstanceLink starts in ADDING state, and should
move to ESTABLISHED state if the connection
is detected in the receptor status output.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If the port is explicitly set to null (causing any ReceptorAddress to
be deleted), then that's a validation error.
If the port is left off but a ReceptorAddress doesn't already exist,
we should not infer a port number and that is also a validation error.
Adds validation and a unit test to ensure:
- peers_from_control_nodes=True should fail if
listener_port is not set
- peers_from_control_nodes=False should be NOOP if
listener_port is not set
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add validation to prevent any managed node
from modifying "peers" through the API
Peering from these nodes should be handled
by setting peers_from_control_nodes only.
Managed nodes are control nodes and
ingress hop nodes.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
InstanceLink target should not be null.
Should be safe to set to null=False, because we have
a custom RunPython method to explicitly set
target to a proper key.
Also, add new test to test_migrations
which ensures data integrity after migrating
the receptor address model changes.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Adds remove_receptor_address to delete a
receptor address from the database
Also, enforce that only 1 canonical address
can be added to an instance via
the add_receptor_address command.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add forwards method to create a receptor address
for any existing Instance that has listener_port defined
- Add forwards method to modify each InstanceLink object
that changes target to the newly created receptor addresses
This migration was implemented as follows:
1. Add a target_new to InstanceLink which is a foreign key
to ReceptorAddress
2. create receptor addresses
3. link to these receptor addresses using the target_new field
4. rename target_new to target
5. drop listener_port and peers_from_control_nodes from Instance
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If a Instance endpoint is patched with
{"peers_from_control_nodes" True}
but a listener_port is not defined on the instance,
or is part of the patch payload, do not create a
receptor address.
Only update or create a receptor address if listener_port
is set, either in the payload or already on the instance.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
command has 3 "targets", all of which should be a
ReceptorAddress object
- peers, disconnect, exact
Add logic to make sure each entry in those lists
are receptor addresses.
When creating InstanceLink objects, make sure
target is ReceptorAddress, not an Instance.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
After removing CRUD from receptor addresses, we need
to remove the module.
- remove receptor_address module
- Add listener_port to instance module
- Add peers_from_control_nodes to instance module
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Removes ability to directly create and delete
receptor addresses for a given node.
Instead, receptor addresses are created automatically
if listener_port is set on the Instance.
For example patching "hop" instance
with {"listener_port": 6667}
will create a canonical receptor address with port
6667.
Likewise, peers_from_control_nodes on the instance
sets the peers_from_control_nodes on the canonical
address (if listener port is also set).
protocol is a read-only field that simply reflects
the canonical address protocol.
Other Changes:
- rename k8s_routable to is_internal
- add protocol to ReceptorAddress
- remove peers_from_control_nodes and listener_port
from Instance model
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Remove peer selection on add and edit instance
- Added conconcial name and order comlums names same on endpoints and
peers
- Other cleanup items
- rename name to instance name
Creates a non-deletable address that acts as
the "main" address for this instance.
All other addresses for that instance must
be non-canonical.
When listener_port on an instance is set, automatically
create a canonical receptor address where:
- address is hostname of instance
- port is listener_port
- canonical is True
Additionally, protocol field is added to instance to
denote the receptor listener protocol to use (ws, tcp).
The receptor config listener information is derived from
the listener_port and protocol information. Having a
canonical address that mirrors the listener_port ensures that
an address exists that matches the receptor config information.
Other changes:
- Add managed field to receptor address.
If managed is True, no fields on on this address can be edited
via the API.
If canonical is True, only the address cannot be edited.
- Add managed field to instance. If managed is True, users
cannot set node_state to deprovisioning (i.e. cannot delete node)
This change to our mechanism to prevent users from deleting
the mesh ingress hop node.
- Field is_internal is now renamed to k8s_routable
- Add reverse_peers on instance which is a list of instance IDs
that peer to this instance (via an address)
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Make receptoraddress list views
searchable by "address"
Other changes:
- Add help text to source and target of the
InstanceLink model
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add receptor_address module which allows
users to create addresses for instances
- Update awx_collection functional and integration
tests to support new peering design
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Updated existing tests to support the
ReceptorAddress model
- cannot peer to self
- cannot peer to node that is already peered to me
- cannot peer to node more than once (via 2+ addresses)
- cannot set is_internal True
Other changes:
Change post save signal to only call
schedule_write_receptor_config() when an actual change is detected.
Make functional tests more robust by
checking for specific validation error in the
response.
I.e. instead of just checking for 400, just for 400
and that the error message corresponds to the
validation we are testing for.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- cannot peer to self
- cannot peer to instance that is already peered to self
Other changes:
- ReceptorAddress protocol field restricted to choices: tcp, ws, wss
- fix awx-manage list_instances when instance.last_seen is None
- InstanceLink make source and target unique together
- Add help text to the ReceptorAddress fields
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- websocket_path can only be set if protocol is ws
- is_internal must be False
- only 1 address per instance can have
peers_from_control_nodes set to True
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
register_peers has inputs:
source: source instance
peers: list of instances the source should peer to
InstanceLink "target" is now expected to be a ReceptorAddress
For each peer, we can just use the first receptor address. If
multiple receptor addresses exist, throw a command error.
Currently this command is only used on VM-deployments, where
there is only a single receptor address per instance, so this
should work fine.
Other changes:
drop listener_port field from Instance. Listener port is now just
"port" on ReceptorAddress
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
group_vars all.yaml changes:
- peer entry has two fields, address and port
- receptor_port is inferred from the first
receptor_address entry that uses protocol tcp
other changes:
ActivityStream now records when receptor_addresses
are peered to
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- write_receptor_config peers to ReceptorAddress entries
that have peers_from_control_nodes enabled
- peers_from_control_nodes and listener_port removed from Instance model
- peers_from_control_nodes added to ReceptorAddress model
- ReceptorAddress is now unique by address and protocol combination
- Write receptor config task is dispatched upon ReceptorAddress creation
or deletion, and when control node is first created
- InstanceLinkSerializer adds a target_address field and has logic
to grab the instance hostname associated with the peered ReceptorAddress
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add post save and post delete hooks to
call write_receptor_config when
a receptor address is added / removed.
Add peers_from_control_nodes to
provision_instance
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add database contraints to make sure addresses
are unique
If port is defined:
address, port, protocol, websocket_path are unique together
if port is not defined:
address, protocol, websocket_path are unique together
- Allow deleting address via API
- Add ReceptorAddressAccess to determine permissions
- awx-manage add_receptor_address returns changed: True
if successful
* Not many, if any, folks use the notebook feature. It kind of goes in
and out of popularity. We've used it in the past when we work on
features that require visualization (i.e. network graphs, workflows).
Might as well keep it around in case we use it again.
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Fixed build and image rendering issues.
* Replaced graphic with many many nodes!
* Updated images with latest
* Incorporated review inputs from @fosterseth.
* Incorporated more review feedback from @fosterseth
* Resized graphic
* Resized graphic again
* Reduced image sizes to scale better on different browsers.
* Added section for using private image for default EE.
* Deleted .md file for execution nodes due to migration to RTD.
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Fixed build and image rendering issues.
* Replaced graphic with many many nodes!
* Updated images with latest
* Incorporated review inputs from @fosterseth.
* Incorporated more review feedback from @fosterseth
* Resized graphic
* Resized graphic again
* Reduced image sizes to scale better on different browsers.
* We introduced multi networks to our docker env. The code this replaces
would return both networks' ip addresses concatinated i.e.
'192.168.2.1192.168.2.3'.
* Add username and password to handle_auth and update exception message
Revise naming of ldap username and password
* Add url for LDAP and userpass to method_auth
* Add information regarding LDAP and username and password to credential plugins documentation
Revise ldap_auth to userpass_auth and revised exception to better reflect functionality
* Revise method_auth to ensure certs can be used with username and ensure namespace functionality is not hindered
Every so often we get connection timed out errors towards our HCP Vault
endpoint. This is usually when a larger number of jobs is running
simultaneously. Considering requests for other jobs do still succeed this
is probably load related and adding a retry should help in making this a
bit more robust.
* Put the awx node(s) on a service-mesh docker network so they can be
proxied to. Also put all the other containers on an explicit awx
network otherwise they can not talk to each other. We might could be
more surgical about what containers we put on awx but I just added all
of them.
Add support for receiving webhooks from Bitbucket Data Center, and add support for posting build statuses back
Note that this is very explicitly only for Bitbucket Data Center.
The entire webhook format and API is entirely different for Bitbucket Cloud.
* persist schedule prompt on launch fields when editing
* Merge job template default credentials with schedule overrides in schedule prompt
* rename vars for clarity
* handle undefined defaultCredentials
---------
Co-authored-by: Michael Abashian <mabashia@redhat.com>
AWX only sends Twilio notifications to one destination with the current version of code, but this is a bug. Fixed this bug for sending SMS to multiple destinations.
* Move awxkit import code into a pytest fixture to better control when
the import happens
* Ensure /awx_devel/awxkit is added to sys path before awxkit import
runs
* Basic export tests
* Added test that highlights a problem with running Schedule exports as
non-root user. We rely on the POST key in the OPTIONS response to
determine the fields to export for a resource. The POST key is not
present if a user does NOT have create privileges.
* Fixed up forwarding all headers from the API server back to the test
code. This was causing a problem in awxkit code that checks for
allowed HTTP Verbs in the headers.
* Narrow the scope of RBAC evaluations
* Update tests for RBAC method changes
* Simplify querset for credentials in org
* Fix call pattern to pass in team role obj
Due to https://github.com/ansible/awx/issues/7560
'omhttp' module for rsyslog will completely stop forwarding message to external log aggregator after receiving a 4xx error from the external log aggregator
This PR is an "workaround" for this problem by restarting rsyslogd after detecting that rsyslog received a 4xx error
When making changes to the application sometime you can accidentally cause FATAL state and cause the dev container to crash which will remove any ephemeral changes that you have made and is ANNOYING!
* Adding hosts bulk deletion feature
Signed-off-by: Avi Layani <alayani@redhat.com>
* fix the type of the argument
Signed-off-by: Avi Layani <alayani@redhat.com>
* fixing activity_entry tracking
Signed-off-by: Avi Layani <alayani@redhat.com>
* Revert "fixing activity_entry tracking"
This reverts commit c8eab52c2ccc5abe215d56d1704ba1157e5fbbd0.
Since the bulk_delete is not related to an inventory, only hosts which
can be from different inventories.
* get only needed vars to reduce memory consumption
Signed-off-by: Avi Layani <alayani@redhat.com>
* filtering the data to reduce memory increase the number of queries
Signed-off-by: Avi Layani <alayani@redhat.com>
* update the activity stream for inventories
Signed-off-by: Avi Layani <alayani@redhat.com>
* fix the changes dict initialiazation
Signed-off-by: Avi Layani <alayani@redhat.com>
---------
Signed-off-by: Avi Layani <alayani@redhat.com>
* Add TLS certificate auth for HashiCorp Vault
Add support for AWX to authenticate with HashiCorp Vault using
TLS client certificates.
Also updates the documentation for the HashiCorp Vault secret management
plugins to include both the new TLS options and the missing Kubernetes
auth method options.
Signed-off-by: Andrew Austin <aaustin@redhat.com>
* Refactor docker-compose vault for TLS cert auth
Add TLS configuration to the docker-compose Vault configuration and
use that method by default in vault plumbing.
This ensures that the result of bringing up the docker-compose stack
with vault enabled and running the plumb-vault playbook is a fully
working credential retrieval setup using TLS client cert authentication.
Signed-off-by: Andrew Austin <aaustin@redhat.com>
* Remove incorrect trailing space
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* Make vault init idempotent
- improve error handling for vault_initialization
- ignore error if vault cert auth is already configured
- removed unused register
* Add VAULT_TLS option
Make TLS for HashiCorp Vault optional and configurable via VAULT_TLS env var
* Add retries for vault init
Sometime it took longer for vault to fully come up and init will fail
---------
Signed-off-by: Andrew Austin <aaustin@redhat.com>
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Co-authored-by: Hao Liu <haoli@redhat.com>
* [CI] Reduce GHA timeouts from 6h default
The goal here is to never interfere with a real run (so most of the
timeout-minutes values seem rather high) but to avoid having 6h long
runs if something goes crazy and never ends.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Do bash hackery instead
Signed-off-by: Rick Elrod <rick@elrod.me>
---------
Signed-off-by: Rick Elrod <rick@elrod.me>
* Fixing wsrelay connection loop
* The loop was being interrupted when reaching the return statements, causing a race condition that would make nodes remain disconnected from their websockets
* Added log messages for the previous return state to improve the logging from this state.
* Added logging for malformed payload
* Update awx/main/wsrelay.py
Co-authored-by: Rick Elrod <rick@elrod.me>
* Moved logmsg outside condition
---------
Co-authored-by: Lucas Benedito <lbenedit@redhat.com>
Co-authored-by: Rick Elrod <rick@elrod.me>
--location (-L) parameter will prompt curl to submit a new request if the URL is a redirect.
After moving to galaxy-NG without -L the curl falsely return 302 for any version
Co-authored-by: John Barker <john@johnrbarker.com>
* allow pytest --migrations to succeed
* We actually subvert migrations from running in test via pytest.ini
--no-migrations option. This has led to bit rot for the sqlite
migrations happy path. This changeset pays off that tech debt and
allows for an sqlite migration happy path.
* This paves the way for programatic invocation of individual migrations
and weaving of the creation of resources (i.e. Instance, Job Template,
etc). With this, a developer can instantiate various database states,
trigger a migration, assert the state of the db, and then have pytest
rollback all of that.
* I will note that in practice, running these migrations is dog shit
slow BUT this work also opens up the possibility of saving and
re-using sqlite3 database files. Normally, caching is not THE answer
and causes more harm than good. But in this case, our migrations are
mostly write-once (I say mostly because this change set violates
that :) so cache invalidation isn't a major issue.
* functional test for migrations on sqlite
* We commonly subvert running migrations in test land. Test land uses
sqlite. By not constantly exercising this code path it atrophies. The
smoke test here is to continuously exercise that code path.
* Add ci test to run migration tests separately, they take =~ 2-3
minutes each on my laptop.
* The smoke tests also serves as an example of how to write migration
tests.
* run migration tests in ci
Currently if you cleanup docker volume for vault and bring docker-compose development back up with vault enabled we will not initialize vault because the secret files still exist.
This change will attempt to initialize vault reguardless and update the secret file if vault is initialized
Adding the possibility to decode base64 decoded strings to Delinea's Devops Secret Vault (DSV).
This is necessary as uploading files to DSV is not possible (and not meant to be) and files should be added base64 encoded.
The commit is making sure to remain backward compatible (no secret decoding), as a default is supplied.
This has been tested with DSV and works for secrets that are base64 encoded and secrets that are not base64 encoded (which is the default).
Signed-off-by: Steffen Scheib <sscheib@redhat.com>
* Set subscription type as developer for developer subscriptions.
Signed-off-by: Tong He <the@redhat.com>
* Set subscription type as developer for developer subscription manifests.
Signed-off-by: Tong He <the@redhat.com>
* Remedy the wrong character to assign value.
Signed-off-by: Tong He <the@redhat.com>
* Reformat licensing.py by black.
Signed-off-by: Tong He <the@redhat.com>
---------
Signed-off-by: Tong He <the@redhat.com>
* Setting credential_type as required
* Added test for missing credential_type in credential module
* Corrected test assertion
---------
Co-authored-by: Lucas Benedito <lbenedit@redhat.com>
* add alt to images in workflow_templates.rst
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
* add alt to images in workflow_templates.rst
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
* Update workflow_templates.rst
* Revised proposed alt text for workflow_templates.rst
---------
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
Co-authored-by: TVo <thavo@redhat.com>
This fixes a bug where jobs within a workflow job were not canceled
when the workflow job was canceled by the user
The fix is to submit the cancel request as a part of the
transaction that WorkflowManager commits its work in
this requires that we send the message without expecting a reply
so this changes the control-with-reply cancel to just a control function
* convert to valid type for serializer
* check that extra_vars are in request
* remove doubled line
* add integration test for change
* move change to the ad_hoc_command module
Signed-off-by: jessicamack <jmack@redhat.com>
* fix imports
Signed-off-by: jessicamack <jmack@redhat.com>
---------
Signed-off-by: jessicamack <jmack@redhat.com>
* Fix Boolean values defaulting to False in collection
* Remove null values in other cases, fix null handling for WFJT nodes
* Only remove null values if it is a boolean field
* Reset changes to WFJT node field processing
* Use test content from sean-m-sullivan to fix lookups in assert
Web container does not need to wait for migration
if the database is running and responsive, but migrations have not finished, it will start serving, and users will get the upgrading page
wait-for-migration prevent nginix and uwsgi from starting up to serve the "upgrade in progress" status page
There are a number of changes here:
- Abstract out a GHA composite action for running the dev environment
- Update the e2e tests to use that new abstracted action
- Introduce a new (matrixed) job for running collection integration
tests. This splits the jobs up based on filename.
- Collect coverage info and generate an html report that people can
download easily to see collection coverage info.
- Do some hacks to delete the intermediary coverage file artifacts
which aren't needed after the job finishes.
Signed-off-by: Rick Elrod <rick@elrod.me>
Might help to install receptor last,
that way when nodes are first connected to the mesh
they already have podman installed and can potentially
run jobs. Otherwise it might be possible for controller
to launch jobs against nodes that aren't fully set up.
Add a check_peers_changed() utility method
to determine if peers in attrs matches
the current instance peers.
Other changes:
- Set ip_address default to "", and do not
allow null.
Get rid of PeersSerializer and just use SlugRelatedField,
which should be more a straightforward approach.
Other changes:
- cleanup code related to the already-removed api/v2/peers
endpoint
- add "hybrid" node type into more instance_peers test cases
API changes
- cannot change peers or enable
peers_from_control_nodes on VM deployments
- allow setting ip_address
- use ip_address over hostname in the generated
group_vars/all.yml
- Drop api/v2/peers endpoint
DB changes
- add ip_address unique constraint, but ignore "" entries
Other changes
- provision_instance should take listener_port option
Tests
- test that new controls doesn't disturb other peers
relationships
- test ip_address over hostname
Dynamically flipping from Established
to Disconnected is not the intended
usage of InstanceLink State.
- Link state starts in Adding and becomes
Established once any control node first sees the link
is in the status KnownConnectionCosts
inspect_established_receptor_connections should
not change link state is current state is Removing.
Other changes:
- rename inspect_execution_nodes to inspect_execution_and_hop_nodes
- Default link state is Adding
- Set min listener_port value to 1024
- inspect_established_receptor_connections now
runs as part of cluster_node_heartbeat task
rename migration function set_peers_from_control_nodes_true to automatically_peer_from_control_plane
import settings and only run function if settings.IS_K8S is true
set listener_port for control nodes to None
Add hop node support to awx collections
- add peers and peers_from_control_nodes fields
- show new node_type "hop"
- add tests for adding hop nodes via collections
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Add Disconnected link state
introspect_receptor_connections is a periodic
task that examines active receptor connections
and cross-checks it with the InstanceLink info.
Any links that should be active but are not
will be put into a Disconnected state. If
active, it will be in an Established state.
UI - Add hop creation and peers mgmt (#13922)
* add UI for mgmt peers, instance edit and add
* add peer info on detail and bug fix on detail
* remove unused chip and change peer label
* rename lookup, put Instance type disable on edit
---------
Co-authored-by: tanganellilore <lorenzo.tanagnelli@hotmail.it>
Add full path for the mv command so that the command can be run from ui_next and from project root.
Additionally move the rename of file to src build step.
Dispatcher refactoring to get pg_notify publish payload
as separate method
Refactor periodic module under dispatcher entirely
Use real numbers for schedule reference time
Run based on due_to_run method
Review comments about naming and code comments
We introduce a thin wrapper over Django's RedisCache so that the functionality of DJANGO_REDIS_IGNORE_EXCEPTIONS is retained while still being able to drop the django-redis dependency.
Credit to django-redis's implementation for the idea of using a decorator for this and abstracting out the exception handling logic.
Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes https://github.com/ansible/awx/issues/14245 which has
more information about this issue.
This change addresses both:
- A clashing signal handler (registering a callback to fire when
the task manager times out, and hitting that callback in cases
where we didn't expect to). Make dispatcher timeout use
SIGUSR1, not SIGTERM.
- Metrics not being reported should not make us crash, so that is
now fixed as well.
Signed-off-by: Rick Elrod <rick@elrod.me>
Co-authored-by: Alan Rominger <arominge@redhat.com>
On some systems, /bin/sh is a bash symlink and running it will launch
bash in sh compatibility mode. However, bash-specific syntax will still
work in this mode (for example using == or pipefail).
However, on systems where /bin/sh is a symlink to another shell (think:
Debian-based) they might not have those bashisms.
Set the shell in the Makefile, so that it uses bash (since it is already
depending on bash, even though it is calling it as /bin/sh by default),
and add a shebang to pre-commit.sh for the same reason.
Signed-off-by: Rick Elrod <rick@elrod.me>
Added persistent storage
Auto-create vault and awx via playbooks
Create a new pattern for custom containers where we can do initialization
Auto-install roles needed for plumbing via the Makefile
When we close/cancel a connection to a web node, give the task time to
clean up after itself and cleanly exit. Otherwise, the Python GC might
clean up the task too early and this leads to ugly log messages like
this: "Task was destroyed but it is pending!"
Signed-off-by: Rick Elrod <rick@elrod.me>
NUL characters are not allowed in text fields in the database
We used to strip them out of stdout but the exception changed
And we want to be sure to strip them out of JSONBlob fields
- Nix an unused function from run_dispatcher. This stopped being used
in 558e92806b but was never removed.
- Fix a typo in run_ws_heartbeat: hearbeat -> heartbeat that has existed
since the beginning of this daemon.
Signed-off-by: Rick Elrod <rick@elrod.me>
Right now we only enable queuing on the rsyslog main_queue. This adds a
parameter to also enable it on the omhttp output action. As omhttp can
take time to process messages (e.g. blocking on the result of its HTTP
requests), this change allows for queuing messages up and hopefully
preventing some messages from getting lost when the log server is slow
to respond.
Signed-off-by: Rick Elrod <rick@elrod.me>
This is expected to free up 4 additional database connections per traditional node
compare to roughly 12 in total before this change
Out of these 3 are accomplished by using existing connection for recently added services
then 1 is obtained by closing the connection for the idle callback receiver main process
Signed-off-by: jessicamack <jmack@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
This adds a handful of metrics to /api/v2/metrics/ recorded from the dispatcher main process
Adds logic in the dispatcher period tasks to calculate these for the last collection interval
Reports worker count, task count, scale up events, and availability
Add data to demo grafana dashboard
This fixes two different exceptions in wsrelay.
* One resulted from heartbeet getting ability in #13858 to gracefully
shut down. When we saw the message come through, we didn't fully
clean up the connection to the web node.
* The second resulted when Redis disappeared. We still want to exit in
that case, but it's better to log a message and exit gracefully
instead of crashing out.
Signed-off-by: Rick Elrod <rick@elrod.me>
raise Exception in the case that return code is non-zero
this approach has shown itself to be the most consistently reliable across multiple ecosystems
This was caused by an incorrect parent_key ref from label to job
also applies to workflow_job labels
This fixes a regression introduced by a recent merge (#13957)
Due to dependency issues specifically around upgrading to Django 4.2, we
cannot feasibly have a dependency on psycopg2 and psycopg3. The only
place that was currently using psycopg3 was wsrelay.
Change wsrelay to use the asyncpg library and psycopg2 instead.
Tested locally on kind with a dev build of awx.
Signed-off-by: Rick Elrod <rick@elrod.me>
This was making host sub-list views non-functional
specifically for constructed and smart inventory
views would always return 0 results before this fix
In a prior merge, we added the ability to slap filter_read_permission = False on a view to get a certain functionality where it didn't filter a sublist the view is showing.
This logic already existed in a highly duplicated form among a number of views, so this deletes those methods in favor of the flag.
* Fix organization not showing all galaxy credentials for org admin
* Add basic test to ensure counts
* refactored approach to allow removal of redundant code
* Allow configurable prefetch_related
* implicitly get related fields
* Removed extra queryset code
ad_hoc_command_cancel really can no longer timeout on a cancel (it happens sub second) and remove unneeded block
Modified all test to respect test_id parameter so that all tests can be run togeather as a single ID
Fix a check in group since its group2 is deleted from being a sub group of group1
The UI now allows to propage sub groups to the inventory which we may want to support within the collection
Only run instance integration test if we are running on k8s and assume we are not by default
Fix hard coded names in manual_project
* Use separate module for test settings
* Further refine some pre-existing comments in settings
* Add CACHES to setting snapshot exceptions to accommodate changed load order
- Change default PYTHON in Makefile to be ranked choice
- Fix `PYTHON_VERSION` target that expects just a word
- Use native GNU Make `$(subst ,,)` instead of `sed`
- Add 'version-for-buildyml' target to simplify ci
If I understand correctly, this change should make
'$(PYTHON)' work how we want it to everywhere. Before
this change, on develpers' machines that don't have
a 'python3.9' in their path, make would fail. With this
change, we will prefer python3.9 if it's available, but
we'll take python3 otherwise.
* Avoid recursive include of DEFAULT_SETTINGS, add sanity test to avoid similar surprises
* Implement review comments for more clear code order and readability
* Clarify comment about order of app name, which is last in order so that it can modify user settings
we link awx.egg-link from `tools/docker-compose/awx.egg-link` to `/tmp/awx.egg-link` than we move `/tmp/awx.egg-link` to `/var/lib/awx/venv/awx/lib/python3.9/site-packages/awx.egg-link`
bonus... now we dont have to set PYTHON=python3.9
Linking launch script and supervisor conf file in kube development environment so we no longer have to rebuild kube devel images for superviosr conf file and launch script changes
- use different dockerfile for awx_devel and awx image
- make all Dockerfile* targets PHONY (bc its cheap to run)
- fix HEADLESS not working for awx-kube-build
In web/task split deployment web and task container no longer share the same redis cache
In the original code we use redis cache to pass the list of sub objects that need to be copied to the new object
In this PR we extracted out the logic that computes the sub_object_list and move it into deep_copy_model_obj task
* Introduce new method in settings, import in-line w NOQA mark
* Further refine the app_name to use shorter service names like dispatcher
* Clean up listener logic, change some names
* Fixed#13402 allow user defined key retrieval from CYBR
* Add default value to object_property
* Raise ValueError if object_property not in response
* Raise KeyError instead of ValueError
When the API request is for /inventories/id use that as the URL in the
API response. When the request is for /constructed_inventories/id use
that.
Signed-off-by: Rick Elrod <rick@elrod.me>
these make targets are for starting the different daemons within the kube/docker development environment updating the name to make it better reflect their intention
also added comments above the make target to describe what they do
note: these comments show up when run `make help`
previously this is used so that task running in the task container can reach into the web container to restart rsyslog
now that the web container and task container are split there's no longer a way to do that so i renamed this env var to reference where it will now do
which is pointing to the supervisor conf file of the current running container
launch_awx.sh that this PR rename is also now only use for launching awx web container renaming to reflect it's purpose
also remove the no longer needed creation of rsyslog conf as rsyslog is no longer in the web container
Update Dockerfile.j2
supervisor.conf.j2 file is the template for supervisor.conf file for the web container rename to supervisor_web.conf make it more clear that it is use for the web container
get_local_queuename will return the pod name of the instance
now that web and task are in different pods when web container queue a task it will be put into a queue without as task worker to execute the task
Works by adding a dedicated producer in wsrelay that looks for
local django channels message with group "metrics". The producer
sends this to the consumer running in the web container.
The consumer running in the web container handles the message by
pushing it into the local redis instance.
The django view that handles a request at the /api/v2/metrics
endpoint will load this data from redis, format it, and return the
response.
We internally manipulate the message payload a bit (to know whether we
are originating it on the task side or the web system is originating
it). But when we get the message, we actually get a reference to the
dict containing the payload.
Other producers in wsrelay might still be acting on the message and
deciding whether or not to relay it. So we need to manipulate and send a
*copy* of the message, and leave the original alone.
Signed-off-by: Rick Elrod <rick@elrod.me>
We no longer need to do this from wsrelay, as it will automatically try
to reconnect when it hears the next beacon from heartbeet.
This also cleans up the logic for what we do when we want to delete a
node we previously knew about.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Created new rsyslog launch file.
* Rsyslog conf work.
* Refining how we're calling rsyslog conf.
* Removed rsyslog so it no longer launches in the web container.
* Added the new launch_awx_rsyslog.sh to the /usr/bin
* add management command and logging for new daemon
* switch tasks over to calling pg_notify
* add daemon to docker-compose and supervisor
* renamed handle_setting_changes and moved notify call
* removed initial rsyslog configure from dispatcher
* add logging and clear cache before reconfigure
* add notify to delete
* moved pg_notify to own function
* update tests impacted by rsyslog change
* changed over to new pg_notify method
Signed-off-by: Jessica Mack <jmack@redhat.com>
* Save facts on model for original host
Redirect to original host for ansible facts
Use current inventory hosts for facts instance_id filter
Thanks for Gabe for identifying this bug
* Fix spelling of queryset
Co-authored-by: Rick Elrod <rick@elrod.me>
* Fix sign error with facts expiry - from review
---------
Co-authored-by: Rick Elrod <rick@elrod.me>
* Backlink events to real hosts and summaries to both hosts
* Prevent error when original host is deleted during job run
* No duplicate entries, review suggestion from Rick
* Change word tense in help text, dict style adjustments
From code review
Co-authored-by: Rick Elrod <rick@elrod.me>
* Back out new variable for constructed host id
---------
Co-authored-by: Rick Elrod <rick@elrod.me>
- When updating, we need the original object so we can make sure we
aren't changing things we shouldn't be.
- We want to allow source_vars and limit, but not much else.
- We want to block everything else (at least, if it doesn't match what
is in the original object...to allow the collection to work properly).
- Add two functional tests.
Signed-off-by: Rick Elrod <rick@elrod.me>
Including changes to our custom Ordered m2m field which previously broke
if the source and target model was the same.
Signed-off-by: Rick Elrod <rick@elrod.me>
Co-authored-by: Alan Rominger <arominge@redhat.com>
- add kind 'constructed' to inventory module
- add 'input_inventories' field to inventory module
Co-authored-by: Rick Elrod <rick@elrod.me>
Signed-off-by: Rick Elrod <rick@elrod.me>
* Add constructed inventory docs and do minor field updates
Add verbosity field to the constructed views
automatically set update_on_launch for the auto-created constructed inventory source
Make the GET function work at most basic level
Basic functionality of updating working
Add functional test for the GET and PATCH views
Add constructed inventory list view for direct creation
Add limit field to constructed inventory serializer
move limit field from InventorySourceSerializer to InventorySourceOptionsSerializer (#13464)
InventorySourceOptionsSerializer is the parent for both InventorySourceSerializer and InventoryUpdateSerializer
The limit option need to be exposed to both inventory_source and inventory_update
Co-Authored-By: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
this allow you to pre-build your ui_next outside of container and it won't try to rebuild when you build awx image
`make ui-next` will no longer rebuild if awx/ui_next/build exist
- move placeholder index_awx.html out of ui_next build dir
- copy index_awx.html to build dir during development bootstrap if UI_NEXT has not been build
* Fix race with heartbeat and reaper logic
* Fix tests to fail when over drift over heartbeat time
* replaced modified with started time for reap() code and added test
* fixed logic bug and cleaned up tests
* Added comments to tests to call out reasoning
- Add new makefile for building ui_next
- Add setting to toggle ui_next
- Add URL path for displaying ui_next
- Update collectstatic and template dir config to serve ui_next
The class that contained these tests wasn't named Test*, so the tests in
it weren't running. Fix that and fix the tests in it so that they pass.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Adds support for a pseudolocalization query param to check to see whether a string has been marked for translation
Adds support for a pseudolocalization query param to check to see whether a string has been marked for translation
* Adds support for passing a lang param to force rendering in a particular language
* Remove unused import
* adding roles to instance groups
added ResourceMixin to Instancegroup and changed the filtered_queryset
* added necessary changes to rebuild relationship between IG and roles
* added description to InstanceGroupAccess
* preliminary ui plug for demo purposes
* preliminary ui plug for demo purposes
added inventory special logic for use_role to allow attaching instance groups
added more tests to handle those cases
* Add access_list to InstanceGroup
* scratch branch to test migration work
* refactored to shorten logic
* Added migration and am removing logic that enabled Org admin permissions
* Add Obj admin role to JT, Inv, Org
* Changed tests to reflect new permissions
* refactored some of the tests
* cleaned up more tests and reworded help on InstanceGroupAccess
* Removed unnecessary delete of Route for instance group perms change
* Fix UI tests and migration
* fixed permissions on prompt for InstanceGroups
* added related object roles endpoint
* added ui/api function for options instance_groups
* separate the migrations in order to avoid issues with migrations not being finished
* changed migrations parent class to disable the activity stream error in migrations
* Added logging to migration as activitystream is disabled
* added clarifying comment to jobtemlateaccess and linted UI addition
* renamed migrations to avoid collisions
* Rename migrations to avoid collisions
MOVE the config template v1 to v2
delete other v1 views since v1 is deleted
the host fact gather collection over time was removed
also the job start view was removed
Insights integration was changed and the host insights
view no longer exists
Slightly modernize config help
- periodically ping postres on port 5432 and only start
migrations if successful.
- prevents crash loop when attempting migrations before
postgres is ready.
linting
linting again
Use the correct role on org permission check
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/main/access.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/main/access.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Alan Rominger <arominge@redhat.com>
fix collection test (#19)
improve readability of through model object creation (#18)
lower num jobs/hosts in tests (#20)
we can test query scaling at lower numbers, to reduce
load in tests. We suspect this was causing some flake
in the tests on PRs
adjust the num of queries
* Bulk launch serializer RBAC and code structure review
Use WJ node as base in bulk job launch child
remove fields we get for free this way
Minor translation marking
Consolidate bulk API permission methods
split out permission check for each UJT type
Code consolidation for org check method
add a save before starting the workflow job
Make the max host default 100. We are seeing with moderate number of hosts i.e. 500 hosts having a few host variable each runs into max size of nginx message and nginx rejects the request.
we are therefor keeping the value small so that it doesn't fail with decent number of host variables as well.
remove the 999 hosts test because the default max is 100
fix the credential check
fix the instance groups and execution env permission checks
* evaluate max bulk settings in validate...
instead of in class attribute. This makes them load at request time
instead of at app start up time, which fixes problems with test
as well as I think will be better user experience if admins
actually do change the setting it will apply without restarting
django app on each instance
* improve OPTIONS by not manually declaring feilds
alan pointed this out
* fix access problems and add Add bulk job max settings to api
filter workflow job nodes better
This will both improve performance by limiting the queryset for the node
sublists as well as fix our access problem.
override can_read instead of modify queryset in access.py
We do this because we are not going to expose bulk jobs to the list
views, which is complicatd and has poor performance implications.
Instead, we just care about individual Workflows that clients get linked
to not being broken.
fix comment
remove the get functions from the conf.py for bulk api max value
comment the api expose of the bulk job variables
reformt conf.py with make black
trailing space
add more assertion to the bulk host create test
check label permission and fix lint (#13)
* set created by and launch type correctly
This makes "launched_by" get computed right in the tests.
Mysteriously this seemed to work from API browser, but
this seems more correct to have it work this way, and makes
tests actually work.
For "manual" launch types the attribute used to populate "launched_by"
is "created_by". And we already have "is_bulk_job" to indicate that the
job is a bulk job. So lets just use this.
* check label is in an organization you can read
* add assertions around access to resulting job
there is a problem getting the job w/ the user that launched it
add more assertions to bulk tests (#11)
dig more into the results and assert on results
also, use a fixture that already implemented the "max queries" thing
fix ansible collection sanity tests (#12)
Various fixes
- Don't skip checking resource RBAC permissions for admins
Necessary to handle bad input, e.g. providing a
unified_job_template id that doesn't exit
- In awxkit, only "walk" if we get 'url' in the result
- Bulk host create should return url pointing to inventory,
not inventory/hosts
dont do org check for superuser
fix the api-lint
fix the api-lint
add the descrition to the bulk job launch module params
add the description for the description field
add the description for the description field
add docs for the bulk api
fix the models on the bulk api serializers
fix some of the issues highlighted in the code review
better use of role model
remove comments
better error message
revert the PrimaryKeyRelatedField for unified_job_template and inventory
Enabled the params bulk job
make black
make black again
Fixed inventory and organization input params for bulk modules
add collection integration tests
Fix cli return errors
fix test completeness
return more context for bulk host create
now return list of minimal info about host objects
[
{
"name": "lakjdsafoiaweirnladlk",
"enabled": true,
"instance_id": "",
"description": "",
"variables": "",
"id": 4593,
"url": "/api/v2/hosts/4593/",
"inventory": "/api/v2/inventories/1/"
}
]
Updated tests, but needed to work around some weird behavior with
sqlite. Apparently it behaves differently around assigning ID's to the
result of bulk_create and that is messed up my use of `reverse` to look
up the url of the hosts
make black changes
increase the number of queries to 30
fix the flake failure
add functional changes for bulk job launch and some minor fixes
pull changes
we needed to inherit from GenericAPIView to get the options to render
correctly
q!
add execution env support
add organization validation to the workflowjob
Update awx/api/serializers.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/api/serializers.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Provide a view that allows users to launch many jobs with one POST
request. Under the hood, this creates a workflow with a number of jobs
all in a "flat" structure -- much like a sliced job, but with arbitrary
"joblets".
For ~ 100 nodes looking at ~ 200 some queries, which is more than the
proof of concept, but still an order of magnitude better than individual
job launches.
Still more work to implement other m2m objects, and need to address what
Organization should be assigned to a WorkflowJob launched by a BulkJob.
They need this so they can step into the workflow_job_nodes and get the
status of all the containing jobs.
Also want to test when there are MANY job templates etc in the system
because the querires like
UnifiedJobTemplate.accessible_pk_qs(request.user, 'execute_role').all()
queries scare me, seems like it could be a lot of things.
use "many=True" instead of ListField
Still seeing identical number of queries when creatin 100 jobs, going to
investigate more
only validate type in nested serializer
then, we actually get the database object after we do the RBAC checks
This drops us down from hundreds of queries to launch 100 jobs,
to less than 100 queries to launch 100 jobs (I got around 24 queries to
launch 100 jobs with credentials)
pave way for more promptable things
add "limit" as possible prompt on launch to bulk jobs
re-organize how we add credentials to pave way for the other m2m items
not having to repeat too much code
add labels to the bulk job
add the other fields to the workflowjobnode
move urls around
allow system admins, org admins, and inventory admins to bulk create
hosts.
Testing on an "open" licensed awx as system admin, I created 1000 hosts with 6 queries in ~ 0.15 seconds
Testing on an "open" licensed awx as organization admin, I created 1000 hosts with 11 queries in ~ 0.15 seconds
fix org max host check
also only do permission denied if license is a trial
add /api/v2/bulk to list bulk apis available
add api description templates
One motiviation to not take a list of hosts with mixed inventories is to
keep things simple re: RBAC and keeping a constant number of queries.
If there is great clamor for accepting list of hosts to insert into
arbitrary different inventories, we could probably make it happen - we'd
need to pop the inventory off of each of the hosts, run the
HostSerializer validate, then in top level BulkHostCreateSerializer
fetch all the inventories/check permissions/org host limits for those
inventories/etc. But that makes this that much more complicated.
add test for rbac access
test also helped me find a bug in a query, fixed that
add test to assert num queries scales as expected
also move other test to dedicated file
also test with super user like I meant to
record activity stream for the inventory
this records that a certain number of hosts were added by a certain user
we could consider if there is any other additional information we want
to include
* Azure AD users should not be able to change their password
* Multiple auth changes
Moving get_external_user function into awx.sso.common
Altering get_external_user to not look at current config, just user object values
Altering how api/conf.py detects external auth config (and making reusable function in awx.sso.common)
Altering logic in api.serializers in _update_pasword to use awx.sso.common
* Adding unit tests
---------
Co-authored-by: John Westcott IV <john.westcott.iv@redhat.com>
awx-kube-build and docker-compose-build share the same Dockerfile
if u run awx-kube-build than docker-compose-build in succession the second command wont run the Dockerfile target and cause the image to be built with the incorrect Dockerfile
from @relrod
`head` will close the input fd when it no longer needs it (or exits). find will try to write to the closed fd and somewhere along the way, it will receive SIGPIPE as a result. This is why `yes | head -5 ` doesn't run forever.
docker-compose v1 is EOL since April 2022 and hasn't received any
updates since May 2021. docker compose v2 is a complete rewrite in
Go which acts as a plugin for the main docker application.
The syntax is the same, but only the `compose` command differs.
This commit adds the ability to override the default `docker-compose`
command using `make DOCKER_COMPOSE='docker compose'`.
Signed-off-by: Tom Siewert <tom@siewert.io>
Providing defaults for API parameters where the API already provides
defaults leads to some confusing scenarios, because we end up always
sending those collection-defaulted fields in the request even if the
field isn't provided by the user.
For example, we previously set the `scm_type` default to 'manual' and
someone using the collection to update a project who does not explicitly
include the `scm_type` every time they call the module, would
inadvertently change the `scm_type` of the project back to 'manual'
which is surprising behavior.
This change removes the collection defaults for API parameters, unless
they differed from the API default. We let the API handle the defaults
or otherwise ignore fields not given by the user so that the user does
not end up changing unexpected fields when they use a module.
Signed-off-by: Rick Elrod <rick@elrod.me>
According to latest documentation, role and value are now "one or more" fields. So they both need to be arrays. Entering the json data as you have in this article doesn't work. But when I added the brackets, it then worked.
Thank you
* Moving reconcile_users_org_team_mappings into common library
* Renaming pipeline to social_pipeline
* Breaking out SAML and generic Social Auth
* Optimizing SMAL login process
* Moving extraction of org in teams from backends into sso/common.create_orgs_and_teams
* Altering saml_pipeline from testing
Prefixing all internal functions with _
Modified subfunctions to not return values but instead manipulate multable objects
Modified all functions to not add duplicate orgs to the orgs_to_create list
* Updating the common function to respect a teams organization name
* Added can_create flag to create_org_and_teams
This made testing easier and allows for any adapter with a flag the ability to simply pass it into a function
* Multiple changes to SAML pipeline
Removed orgs_to_create from being passed into user_team functions, common create orgs code will add any team orgs to list of orgs automatically
Passed SAML_AUTO_CREATE_OBJECTS flag into create_org_and_teams
Fix bug where we were looking at values instead of keys
Added loading of all teams if remove flag is set in update_user_teams_by_saml_attr
* Moving common items between SAML and Social into a 'base'
* Updating and adding testing
* Renamed get_or_create_with_default_galaxy_cred to get_or_create_org_...
_update_m2m_from_groups must return None if remove_* is false or empty,
because None indicates that the user permissions will not be changed.
related #13429
Perform a git reset --hard before attempting to release awxkit to pypi.
We found that something new in the process was causing an unexpected behavior if the git tree had any changes inside it.
It would cause a devel version to be created and used as part of the upload which pypi was refusing.
Collections can not easly be deleted from galaxy so if we have to rerun a job because of a pypi or quay failure we don't want to try and upload the collection again.
* first deprecation pass, need to confirm date or version
* remove doc block updates as not needed, update runtime and remove symlinks
* add line to readme as notable release
* update version before release
* Workaround for events with NUL char, touch up error loop
This fixes an error where some events would not save
due to having the 0x00 character which errors in postgres
this adds a line to replace it with empty text
Hitting that kind of event put us in an infinite error loop
so this change makes a number of changes to prevent similar loops
the showcase example is a negative counter,
this is not realistic in the real world but works for unit tests
These error loop fixes seek to esablish the cases where we clear the buffer
Some logic is removed from the outer loop, with the idea that
ensure_connection will better distinguish flake
* From review comments, delay NUL char sanitization to later
Use pop to make list operations more clear
* Fix incorrect use of pop
Description
Thycotic has various types of Secret Templates like Password, SSH Key
Thycotic API returns str type for Password and of Type for class
requests.models.Response for SSH Key. Current implementation only
considers Password template. However when trying for SSH Key code
need return the str from response type requests.models.Response
Signed-off-by: Tarun CHawdhury <tarunchawdhury@gmail.com>
HC Vault clusters use eventual consistency and might return an HTTP 412
if the secret ID hasn't replicated yet to the replicas / standby nodes.
If this happens the request should be retried.
related #13413
Signed-off-by: Kristof Wevers <kristof.wevers@infura.eu>
With the latest version of rsyslog we had a test failing with:
AssertionError: Response data: {'error': "b'rsyslog internal message (3,-2455): could not transfer the specified internal posix capabilities settings to the kernel, capng_apply=-5\\n [v8.2102.0-107.el9 try https://www.rsyslog.com/e/2455 ]\\n'"}
Downgrading fixes it
If a job fails, we do receptor work results and put that output
into result_traceback.
We should only do this if
1. Receptor unit has failed
2. Runner callback processed 0 events
Otherwise we risk putting too much data into this field.
The hiredis 2.1.0 release doesn't provide source distribution on PyPi so
users can't build that python package from sources.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Use a Makefile arg for the ansible-test sanity CLI args
defaults to --docker
in the future we probably need to customize python versions
Copy the rule exception for Ansible 2.15
this helps people who are running from Ansible devel
Also just ignore one sanity test for the export module, instead of
ignoring all of them.
Also use latest ansible-test, and make it work on GHA (by using podman
instead of docker).
Signed-off-by: Rick Elrod <rick@elrod.me>
* Run collection sanity tests in CI
This requires adding a Makefile install of ansible-core
Fake the version to make semver check happy
* Fixes from ansible-test sanity failures
* Exclude the export module due to awxkit requirement
* Fix broken ansible-test rule exceptions
remove Ansible 2.14 exclusions that make ansible-test ERROR, saying they are not needed
* fix pytz
* fix NameError
* fix tests and add sanity ignore files for import test until distutils replaced
* change static method to regular method and update test to instantiate class
- `settings/minikube.py` gets imported conditionally, when the
environment variable `AWX_KUBE_DEVEL` is set. In this imported file,
we set `BROADCAST_WEBSOCKET_PORT = 8013`, but 8013 is only used in the
docker-compose dev environment. In Kubernetes environments, 8052 is
used for everything. This is hardcoded awx-operator's ConfigMap.
- Also rename `minikube.py` because it is used for every kind of
development Kube environment, including Kind.
Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes several things related to our wsbroadcast stats handling.
This was found during the ongoing wsrelay work.
There are really three fixes here:
- Logging was not actually enabled for the analytics.broadcast_websocket
module, so that has been added to our loggers config.
- analytics.broadcast_websocket was not actually able to connect to
Redis due to 68614b83c0 as part of
the work in #13187. But there was no easy way to know this because the
logging issue meant no exceptions showed up anywhere reasonable.
- Relatedly, and also as part of #13187, we jumped from
`prometheus-client` 0.7.1 up to 0.15.0. This included a breaking
change where a `Counter` ending with `_total` will clash with a
`Gauge` of the same name but without `_total`. I am not 100% sure of
the reasoning here, other than "OpenMetrics compatibility".
Refs #13301
Refs #13187
Signed-off-by: Rick Elrod <rick@elrod.me>
In #13200 the dev env was changed to make `verifysignature` optional,
dependent on a variable set before ansible gets run to set up the
`docker-compose` environment.
However along with that change, a change to the execution node install
bundle slipped in, which is seemingly unrelated to the dev env change
and is breaking some installs: #13234, ansible/awx-operator#1132.
I think this change was unintentional as it would at least require
another change in ansible/receptor-collection and maybe a change in
ansible/awx-operator as well.
Signed-off-by: Rick Elrod <rick@elrod.me>
This will allow users of the operator to set these settings
so from the start when the operator creates the default
execution queue they can control the max_forks and max_concurrent_jobs
on the default container group.
The intention of this feature is primarily to provide some notion of max
capacity of container groups, but the logic I've left generic. Default
is 0, which will be interpereted as no maximum number of jobs or forks.
Includes refactor of variable and method names for clarity.
instances_by_hostname is an internal attribute of TaskManagerInstances.
Clarify when we are expecting the actual TaskManagerInstances object.
Unify how we process running tasks and consume capacity. This has the
effect that we do less expensive work in after_lock_init and have 1 less
loop over all the running tasks. Previously we looped for both building
the dependency graph as well as for calculating the starting capacity of
all the instances and instance groups. Now we acheive both tasks in the
same loop.
Because of how this changes the somewhat subtle "do-si-do" of how to
initialize the Task Manager models, introduce a wrapper class that tries
to take some of that burden off of other areas where we re-use this like
in the serializer and the metrics. Also use this wrapper class to handle
nicities of how to track capacity consumption on instances and instance
groups.
Add tests for max_forks and max_concurrent_jobs
Fixup tests that use TaskManagerModels to accomodate changes.
assign ig before call to consume capacity
if we don't do it in that order, then we don't correctly account for
the container group jobs we are starting in the middle of the task
manager run
Since the original version of the migration a) invoked the .save()
method, and b) involved a model with a custom field that had a
post_save handler attached, this migration had a side-effect that
caused the codebase's version of the model to be used when the table
involved wasn't yet up to date. This triggers an UndefinedColumn error.
This change works around the problem by making use of queryset
.update() methods instead, which should avoid the post_save signal
trigger.
related to https://github.com/ansible/ansible/pull/78175
the way the GHA runner is built, Python runs with a mixed locale between the FS bits and the default encoding, which can cause unpredictable issues
adding env var `LC_ALL: "C.UTF-8"` prevent flakiness due to locale issue
Signed-off-by: Hao Liu <haoli@redhat.com>
Removing all >= dependencies as these were upgraded past the >= version with the last update.
The following libraries were secondary imports and were removed from the requirements.in as we are past the version required to fix their CVEs:
* autobhan
* kubernetes
* pyjwt
* sqlparse
aioredist was superceeded by redis
Someone referenced this directly but didn't add it to requirements.in. So when we upgraded channels-redis and it dropped aioredis this started failing
Certs are generated on the host and there is currently an issue due to openssl version mispatch between Fedora 36 and CentOS Stream 8 which causes:
tools_awx_1 | ERROR 2022/11/15 17:09:17 could not load signing key file: unknown block type PRIVATE KEY
tools_awx_1 | ERROR 2022/11/15 17:09:17 could not load signing key file: unknown block type PRIVATE KEY
* Facts scaling fixes for large inventory, timing issue
Move save of Ansible facts to before the job status changes
this is considered an acceptable delay with the other
performance fixes here
Remove completely unrelated unused facts method
Scale related changes to facts saving:
Use .iterator() on queryset when looping
Change save to bulk_update
Apply bulk_update in batches of 100, to reduce memory
Only save a single file modtime, avoiding large dict
Use decorator for long func time logging
update decorator to fill in format statement
* Fixes#13119#13120 Cloud support & update brand
* rm base64 import to pass lint
* Update references across the board
* Removed final reference to CyberArk Conjur Secret Lookup
Once since it is defined as a CustomCommand subclass, and once because
it is an endpoint at the /api/v2/ level. With Python 3.11 argparse
has become more strict and will raise an exception when you try to
inject duplicate subparsers.
- enable schema upload to s3 bucket for feature branch
- add workflow to delete schema from s3 bucket when feature branch is deleted
Signed-off-by: Hao Liu <haoli@redhat.com>
Previously, in some cases, an InventoryUpdate sourced by an SCM project
would still run and be successful even after the project it is sourced
from failed to update. This would happen because the InventoryUpdate
would revert the project back to its last working revision. This
behavior is confusing and inconsistent with how we handle jobs (which
just refuse to launch when the project is failed).
This change pulls out the logic that the job launch serializer and
RunJob#pre_run_hook had implemented (independently) to check if the
project is in a failed state, and puts it into a method on the Project
model. This is then checked in the project launch serializer as well as
the inventory update serializer, along with
SourceControlMixin#sync_and_copy as a fallback for things that don't run
the serializer validation (such as scheduled jobs and WFJT jobs).
Signed-off-by: Rick Elrod <rick@elrod.me>
This takes some logic out of the queryset logic,
using some established assumptions about the task manager
if a job lands on a hybrid node (or is a project update) then
it will have the same controller and execution node
With that established, the queryset can be simplified
Really these could get any of the unified job template types, not just
system job templates, so importing e.g. a project with a schedule was
doing them in the wrong order.
Also, bump the timeout of the project update and make sure that we
stash it in the page cache even if it doesn't finish in 5 minutes.
when running `make ui-devel`. Previously they were going to
/awx_devel/awx/public/static, but that directory is no longer being
served up by nginx, which forced us to have to run `make
collectstatic` (or equivalent) to get the files to the right place.
More fun in the grafana dashboard. The rows organize the panels and are
collapsable. Also, tested with multiple nodes and fixed some
labeling issues when there are more than one node.
Update grafana alerting readme info and some fun prose about one of the
alerts as well as some reorganizing of the code for clarity.
finally, drop the time to fire for alerts because it's better to have them be a bit touchy so users can verify they work vs. not being sure.
* initial commit of hostname validation to InstanceSerializer
Co-authored-by: Cesar Francisco San Nicolas Martinez <cesarfsannicolasmartinez@gmail.com>
- Firstly -- add a bunch of unit tests for `update_scm_url`, because it
previously had none and desperately needed them.
- Secondly -- fix#12992 by adding back in IPv6 address brackets if they
existed in the first place when the function was called.
- Thirdly -- fix a related case where we disallowed IPv6 in URLs that
did not include the scheme.
Signed-off-by: Rick Elrod <rick@elrod.me>
Extrapolating reconciliation of desired and actual states to a function
Converting heave prefect related methods to user focus for query optimization
Converting from get_or_create to simply create
Added memory calculations for query optimization
* fix name to be consistent
this is not a mean, its the last value
so say that in the name
* add remaining capacity to dashboard
also make legends pretty with nice names
- stdout output on events was being double HTML entity encoded meaning
that all output with < and > was shown as literal "<" and ">"
Signed-off-by: Rick Elrod <rick@elrod.me>
The initial check performed case insensitive searches and the new method was case sensitive
The optimization of the new method is likely not going to contribute noticable slowness
This will enable us to provide more useful information for the user,
now that all user-triggered health checks are async.
Also, de-bounce the health check endpoint to not allow additional
health check tasks to be triggered when one is already in progress.
We don't specify defaults in the module (because it messes up Instance
updates because AWX things we are trying to change things to be the
default).
- Update the docs to remove the defaults that no longer exist
- Update tests to make them pass (oops)
- Fix tangentially related typo in Kind development docs
Signed-off-by: Rick Elrod <rick@elrod.me>
We were disabling the field when a user did not have sufficient permissions to create an Inventory. I updated this logic to check if a user has use permissions on the selected inventory before disabling the field.
typo in URL and in grafana alert rule
Important learning: no newlines in rules/equations
turns out datasourceUid can be set in prometheus_source.yml, and it can be anything we want. So I have set it to awx_alert, the PBFAnumbersetc value it was set to before was an autogenerated UID, and it would actually work just with that generated value, but because we want it to make sense, we're setting the value in prometheus_source.yml
finally, update the docs to be reflective of grafana docs and how to export new rules a user might want to add.
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
- Prevents changing hostname, listener_port, or node_type for instances
that already exist
- API default node_type is execution
- API default node_state is installed
awx-web container does not have access to receptor socket, and the
execution node health check requires receptorctl.
This change runs the health check asynchronously in the task container.
After all jobs on the node are complete, delete the node then
broadcast the write_receptor_config task.
Also, make sure that write_receptor_config updates the state of links
that are in 'adding' state.
- allow the node_state to be set to deprovisioning
- set the links that touch the instance to removing
- only allow on K8S
- only allow to be done to execution nodes
- use dotted circles to represent `enabled: false`
- use solid circle stroke to represent `enabled: true`
- excise places where `Unavailable` node state is used in the UI.
add scaffolding for instance install_bundle endpoint
- add instance_install_bundle view (does not do anything yet)
- add `instance_install_bundle` related field to serializer
- add `/install_bundle` to instance URL
- `/install_bundle` only available for execution and hop node
- `/install_bundle` endpoint response contain a downloadable tgz with moc data
TODO: add actual data to the install bundle response
Signed-off-by: Hao Liu <haoli@redhat.com>
when a new remote execution/hop node is added
regenerate the receptor.conf for all control node to
peer out to the new remote execution node
Signed-off-by: Hao Liu <haoli@redhat.com>
Co-Authored-By: Seth Foster <fosterseth@users.noreply.github.com>
Co-Authored-By: Shane McDonald <me@shanemcd.com>
- node_state is now read only
- node_state gets set automatically to Installed in the create view
- raise a validation error when creating on non-K8S
- allow SystemAdministrator the 'add' permission for Instances
- expose the new listener_port field
- nodes with states Provisioning, Provisioning Fail, Deprovisioning,
and Deprovisioning Fail should bypass health checks and should never
transition due to the existing machinery
- nodes with states Unavailable and Installed can transition to Ready
if they check out as healthy
- nodes in the Ready state should transition to Unavailable if they
fail a check
Remove corresponding views for job instance_groups
Validate job_slice_count in API
Remove defaults from some job launch view prompts
the null default is preferable
Additionally, move the inventory-specific hacks of yesteryear
into the prompts_dict method of the WorkflowJob model
try to make it clear exactly what this is hacking and why
Correctly summarize label prompts, and add missing EE
Expand unit tests to apply more fields
adding missing fields to preserve during copy to workflow.py
Fix bug where empty workflow job vars blanked node vars (#12904)
* Fix bug where empty workflow job vars blanked node vars
* Fix bug where workflow job has no extra_vars, add test
* Add empty workflow job extra vars to assure fix
Removing try/except around instance_groups
Removing redefined execution_environment
Reordering labels/creds/igs/ee/etc
Removing special treatment for EEs when doing setattrs
Adding help_text to execution environments
Adding EE serializer on JobCreateScheduleSerializer
Remove if-not-data conditional from WFJTnode.can_change
these are cannonical for can_add, but this looks like a bug
Change JTaccess.can_unattach to call same method in super()
previously called can_attach, which is problematic
Better consolidate launch config m2m related checks
Test and fix pre-existing WFJT node RBAC bug
recognize not-provided instance group list on launch, avoiding bug where it fell back to default
fix bug where timeout field was saved on WFJT nodes after creating approval node
remove labels from schedule serializer summary_fields
remove unnecessary prefetch of credentials from WFJT node queryset
Fixes bug where Forks showed up in both default values and prompted values in launch summary
Fixes prompting IGs with defaults on launch
Make job tags and skip tags full width on workflow form
Fixes bug where we attempted to fetch instance groups for workflows
Fetch default instance groups from jt/schedule for schedule form prompt
Grab default IGs when adding a node that prompts for them
Adds support for saving labels on a new wf node
Fix linting errors
Fixes for various prompt on launch related issues
Adds support for saving instance groups on a new node
Adds support for saving instance groups when editing an existing node
Fix workflowReducer test
Updates useSelected to handle a non-empty starting state
Fixes visualizerNode tests
Fix visualizer test
Second batch of prompt related ui issues:
Fixes bug saving existing node when instance groups is not promptable
Fixes bug removing newly added label
Adds onError function to label prompt
Fixes tooltips on the other prompts step
Properly fetch all labels to show on schedule details
This removes a loop that ran on import
the loop was giving the wrong behavior
and it initialized too many fields as char_prompts fields
With this, we will now enumerate the char_prompts type fields manually
Adds support for prompting labels on launch in the UI
Fix execution environment prompting in UI
Round out support for prompting all the things on JT launch
Adds timeout to job details
Adds fetchAllLabels to JT/WFJT data models
Moves labels methods out to a mixin so they can be shared across JTs/WFJTs/Schedules
Fixes bug where ee was not being sent on launch
Adds the ability to prompt for ee's, ig's, labels, timeout and job slicing to schedules
Fixes bug where saving schedule form without opening the prompt would throw errors
Adds support for IGs and labels to workflow node prompting
Adds support for label prompting to node modal
Fix job template form tests
* Making almost all fields promptable on job templates and config models
* Adding EE, IG and label access checks
* Changing jobs preferred instance group function to handle the new IG cache field
* Adding new ask fields to job template modules
* Address unit/functional tests
* Adding migration file
Adds prompt on launch buttons to labels, forks, job slicing, timeout, and instance groups
Adds prompting for labels on workflow job template
Updates flags that denote when prompting is necessary in various places
Adds prompting support for timeout, job slicing, forks, labels, instance groups and execution environments to the prompt details
Show prompted ee, forks, job slice and labels on schedule details
Adds support for ee, labels, forks, job slicing and timeout prompting to the node view modal
Add default values when prompting for ee's, forks, job slicing and timeout
Adds launch prompt step for execution environments
Adds fields for timeout, job slicing and forks to other prompts step of launch
- Fix out of scope variable in error message in the action plugin
- Rename action plugin from playbook_integrity to verify_project
Refs #12887 which pointed out the out of scope variable
Signed-off-by: Rick Elrod <rick@elrod.me>
This rule alerts if the redis queue is larger than what the rolling
average event insertion rate/second * 120. In other words, if the redis
queue is larger than it appears we can process events in two minutes.
It appears it has to meet this condition for 60 seconds to start firing.
Future commits will address how to configure contact points like slack.
shout out to @jainnikhil30 and @rebeccahhh who figured this out in jam
session this morning.
When launching a job template, if the last project update failed due to
signature validation, show an error that actually says that.
Signed-off-by: Rick Elrod <rick@elrod.me>
This was missed when we landed #12813. Adds cryptography
kind to the CredentialType allowed kinds list, which now
produces the proper error message when attempting to PUT
to modify the managed credential type.
Signed-off-by: Rick Elrod <rick@elrod.me>
Rather than only allowing the signature credential to be specified on
project using git, allow it to be specified on any project at all.
This moves the field to always show, and moves it out of the git
subform.
Signed-off-by: Rick Elrod <rick@elrod.me>
add new managed credential type for gpg pub key
add migration file to setup managed credential types to add the new credential type
Signed-off-by: Hao Liu <haoli@redhat.com>
- Extract how slicing is done from Inventory#get_script_data and pull it
into a new method, Inventory#get_sliced_hosts
- Make use of this method in Inventory#get_script_data
- Make use of this method in Job#_get_inventory_hosts (used by
Job#start_job_fact_cache and Job#finish_job_fact_cache).
This fixes an issue (namely in Tower 4.1) where job slicing with fact
caching enabled doesn't save facts for all hosts.
Signed-off-by: Rick Elrod <rick@elrod.me>
If canceled attempted before, still allow attempting another cancel
in this case, attempt to send the sigterm signal again.
Keep clicking, you might help!
Replace other cancel_callbacks with sigterm watcher
adapt special inventory mechanism for this too
Get rid of the cancel_watcher method with exception in main thread
Handle academic case of sigterm race condition
Process cancelation as control signal
Fully connect cancel method and run_dispatcher to control
Never transition workflows directly to canceled, add logs
We should be consistent about this. Also this takes us from doing a as
many queries to the UnifiedJob table as we have instances to doing 1
query to the UnifiedJob table (and both do 1 query to Instances table)
Reasoning:
- This is breaking the UI in official image builds of devel
- This is always being overridden in our packaging
- PROJECTS_ROOT and JOBOUTPUT_ROOT also hardcode /var/lib/awx
Make it so that submitting a task to the dispatcher happens as part of the transaction.
this applies to dispatcher task "publishers" which NOTIFY the pg_notify queue
if the transaction is not successful, it will not be sent, as per postgres docs
This keeps current behavior for pg_notify listeners
practically, this only applies for the awx-manage run_dispatcher service
this requires creating a separate connection and keeping it long-lived
arbitrary code will occasionally close the main connection, which would stop listening
Stop sending the waiting status websocket message
this is required because the ordering cannot be maintained with other changes here
the instance group data is moved to the running websocket message payload
Move call to create_partition from task manager to pre_run_hook
mock this in relevant unit tests
Avoid running jobs that have already been reapted
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Remove unnecessary extra actions
Fix waiting jobs in other cases of reaping
add `make help`
that prints all available make targets
help text generated from comments above the make target starting with `##`
Signed-off-by: Hao Liu <haoli@redhat.com>
When on the screen in the UI that loads the job events, the ui includes
a filter to exclude job events where stdout = ''. Because this is a
TextField and was not in the allow list, we were applying DISTINCT to
the query. This made it very unperformant for large jobs, especially
on the query that gets the count and cannot put a LIMIT on the query.
Also correctly prefetch the related job_template data on the view to
cut down the number of queries we make from around 50 to under 10.
We need to analyze other similar views for other prefetch type
optimizations we should make.
* refactor ScheduleFormFields into own file
* refactor ScheduleForm
* wip complex schedules form
* build rruleset from inputs
* update schedule form validation for multiple repeat frequencies
* add basic rrule set parsing when opening schedule form
* complex schedule bugfixes, handle edge cases, etc
* fix schedule saving/parsing for single-occurrence schedules
* working with timezone issues
* fix rrule until times to be in UTC
* update tests for new schedule form format
* update ouiaIds
* tweak schedules spacing
* update ScheduleForm tests
* show message for unsupported schedule types
* default schedules to browser timezone
* show error type/message in ErrorDetail
* shows frequencies on ScheduleDetails view
* handles nullish values
* Forcing an unbind for a django-auth-ldap sticky session to the LDAP server
* Focring _connection_bound to false after closing and modifying exceptino logging
Introduce build_project_dir method
the base method will create an empty project dir for workdir
Share code between job and inventory tasks with new mixin
combine rest of pre_run_hook logic
structure to hold lock for entire sync process
force sync to run for inventory updates due to UI issues
Remove reference to removed scm_last_revision field
When creating unified job, stash the list of pk values from the
instance groups returned from preferred_instance_groups so that the
task management system does not need to call out to this method
repeatedly.
.preferred_instance_groups_cache is the new field
Instead of loading all pending Workflow Approvals in the task manager,
run a query that will only return the expired apporovals
directly expire all which are returned by that query
Cache expires time as a new field in order to simplify WorkflowApproval filter
We always add the job to the graph right before calling start task.
Reduce complexity of proper operation by just doing this in start_task,
because if you call start_task, you need to add it to the dependency
graph
we had tried doing this in the WorkflowManager, but we decided that
we want to handle ALL pending jobs and "soft blockers" to jobs with the
TaskManager/DependencyGraph and not duplicate that logic in the
WorkflowManager.
add settings to define task manager timeout and grace period
This gives us still TASK_MANAGER_TIMEOUT_GRACE_PERIOD amount of time to
get out of the task manager.
Also, apply start task limit in WorkflowManager to starting pending
workflows
more than saving the loop, we save building the WorkflowDag twice which
makes LOTS of queries!!!
Also, do a bulk update on the WorkflowJobNodes instead of saving in a
loop :fear:
implement https://github.com/ansible/awx/issues/12446
in development environment, enable set of views that run
the task manager(s).
Also introduce a setting that disables any calls to schedule()
that do not originate from the debug views when in the development
environment. With guards around both if we are in the development
environment and the setting, I think we're pretty safe this won't get
triggered unintentionally.
use MODE to determine if we are in devel env
Also, move test for skipping task managers to the tasks file
- DependencyManager spawns dependencies if necessary
- WorkflowManager processes running workflows to see if a new job is
ready to spawn
- TaskManager starts tasks if unblocked and has execution capacity
Change:
- Case-insensitive search only makes sense on strings, so check the
type of the field we are searching and ensure it is a string field
(TextField, CharField, or some subclass thereof).
- This prevents a 500 error when a user uses iexact on, e.g., an
integer field. Now, a 400 Bad Request is returned instead.
Test Plan:
- Added simple unit tests for iexact
Tickets:
- Fixes#9222
Signed-off-by: Rick Elrod <rick@elrod.me>
* Modifying SAML adapter to not auto-add default galaxy creds to orgs on login
* Adding test, fixing old tests and moving add_default_galaxy_credential to pipeline
* Enhanced detail component to handle cases with no values, and refactored components that use detail component.
* Add optional chaining operators where necessary to pass test cases
* add test cases to test suites of modified files
Co-authored-by: Veda Periwal <vperiwal@vperiwal-mac.attlocal.net>
This optimizes the ActivityStreamSerializer by only getting many-to-many
relationships that are speculatively non-empty
based on information we have in other fields
We run this every time we create an object as an on_commit action
so it is expected this will have a major impact on response times for launching jobs
fixes https://github.com/ansible/awx/issues/7946
- added WorkflowApprovalTemplate page type to allow URL registration
- added resources regex that’s associated resource URL with WorkflowApprovalTemplate
- registered the new resource regex with WorkflowApprovalTemplate page type
- modified `DEPENDENT_EXPORT` handling (insisted by @jbradberry)
- added special case handling for WorkflowApprovalTemplate due to its unique nature
unique nature of WorkflowApprovalTemplate
- when exporting WorkflowJobTemplate with approval node the WorkflowJobTemplateNode need to contain a related "create_approval_template" the POST data for "create_approval_template" need to come from the "workflow_approval_template"
- during the export of a WorkflowJobTemplateNode that is an approval node we need to get the data from "workflow_approval_template" and use that to populate the "create_approval_template"
Co-Authored-By: Jeff Bradberry <685957+jbradberry@users.noreply.github.com>
Signed-off-by: Hao Liu <haoli@redhat.com>
* Added help text to schedule form and detail with link to documentation
* Added test cases for help text in schedule form and detail
* Add help text to schedule form and detail with link to documentation
Co-authored-by: Veda Periwal <vperiwal@vperiwal-mac.attlocal.net>
* Reap jobs on dispatcher startup to increase clarity, replace existing reaping logic
* Exit jobs if receiving SIGTERM signal
* Fix unwanted reaping on shutdown, let subprocess close out
* Add some sanity tests for signal module
* Add a log for an unhandled dispatcher error
* Refine wording of error messages
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
* add database connection to the metrics endpoint
* bump the counts collector version to 1.2
* check for postgresql as database so to not break the tests
* Track combined artifacts on workflow jobs
* Avoid schema change for passing nested workflow artifacts
* Basic support for nested workflow artifacts, add test
* Forgot that only does not work with polymorphic
* Remove incorrect field
* Consolidate logic and prevent recursion with UJ artifacts method
* Stop trying to do precedence by status, filter for obvious ones
* Review comments about sets
* Fix up bug with convergence node paths and artifacts
Adding standard subject line to triage_replies.md
Removing PR commit generated change log in favor of github auto-commit log
Updating some images
Adding AWX matrix chanel to IRC notifications
Adding references between operator and AWX releases
we've observed this in development and some users have reported experiencing 500's on /api/v2/metrics because of a key error here where a metric is missing from a certain instance
- listen specifically within awx/awx, so that changes in awxkit or
awx_collection don't trigger spurious reloads
- expand the exclude pattern to ignore the test directories
* Removing old awxbot files
* Removing security bug report as GitHub now shows the security piolicy from /SECURITY.md
* Changing feature_request from md to yml
* Adding additional options to bug report components andinstall method
* Removing old ISSUE_TEMPLATE.md
* Changing issue type and adding additional components
* Removing auto-generated change log
* Adding awx_collection and cli components
* Changing content search pattern for type labels
* Changing from collection to awx_collection tag and adding dependencies tag
* Adding unicode bug to bug repot to match feature unicode character
* Changing bug to bug or docs
* Remove docker on * and boot2docker infavor of docker development environmnet
* Create top level issue with: CoC, Enterprise, Top level help
* Remove old CODEOWNERS file
* Show add access button if it is a system admin
* Hide access button if the user is credential admin, org admin, but the
credential does not belong to any org.
* Show access button if the user is a credential admin, org admin, and
the credential is associated to an org.
* Show access button if the user is an org admin and the credential is
associated to the org.
All those permutations are allowed by the API RBAC.
This PR update UX to not allow the user to attempt to perform any
action that will raise an error when modifying access to the
credentials.
Update project status to reflect project update sync related to job
template that was launched with branch override.
We were displaying status of project sync itself, not from the project
update job as expected.
Also, rename `Project Status` to be `Project Update Status`.
See: https://github.com/ansible/awx/issues/11987
* Logout as User A and Login as User B redirects to `/home'
* Logout as User A and Login as User A redirects to `/home'
* Allow session to timeout as User A and Login as User A redirects to User A's last location
See: https://github.com/ansible/awx/issues/11167
grafana via prometheus.
This metric is a good indicator of how far behind the callback receiver
is. The higher the load the further behind/the greater the number of
seconds the metric will display.
This number being high may indicate the need for horizontal scaling in
the control plane or vertically scaling the number of callback
receivers.
This is particularly useful when you are using the @filepath version
of the flag, since otherwise there would be no way to issue the
command with multiple vars files.
Also, add `-e` as an alias to `--extra_vars`
- scm based inventory sources should launch project updates prior to
running inventory updates for that source.
- fixes scenario where a job is based on projectA, but the inventory
source is based on projectB. Running the job will likely trigger a
sync for projectA, but not projectB.
comments
Add details related workflow job on the work flow approval details
Remove not used prop isLoading, fix, and expand unit-tests related to
workflow approval details.
* Only use in-memory cache for database settings
Make necessary adjustments to monkeypatch
as it is very vunerable to recursion
Remove migration exception that is now redundant
Clear cache if a setting is changed
* Use dedicated middleware for setting cache stuff
Clear cache for each request
* Add tests for in-memory cache
* We observed daphne giving tracebacks when accessing logging settings.
Originally, configure tower in tower settings was no a suspect because
daphne is not multi-process. We've had issues with configure tower in
tower settings and multi-process before. We later learned that Daphne
is multi-threaded. Configure tower in tower was back to being a
suspect. We constructed a minimal reproducer to show that multiple
threads accessing settings can cause the same traceback that we saw in
daphne. See
https://gist.github.com/chrismeyersfsu/7aa4bdcf76e435efd617cb078c64d413
for that recreator. These fixes stop the recreation.
- The `z` option indicates that the bind mount content is shared among multiple containers.
- The `Z` option indicates that the bind mount content is private and unshared.
If multiple container attempt to mount the same directory `Z` option will cause a raise condition where only the last container started will have access to the file.
Ref: https://docs.docker.com/storage/bind-mounts/#configure-the-selinux-label
Signed-off-by: Hao Liu <haoli@redhat.com>
Now that we are adding popovers for details pages, I noticed a couple of
strings wrapping in odd places, update css to avoid that.
Also `word-break: break-word` was deprecated.
* Delay update of artifacts until final job save
Save tracebacks from receptor module to callback object
Move receptor traceback check up to be more logical
Use new mock_me fixture to avoid DB call with me method
Update the special runner message to the delay_update pattern
* Move special runner message into post-processing of callback fields
* Track host_status_counts and use that to process notifications
* Remove now unused setting
* Back out changes to callback class not needed after all
* Skirt the need for duck typing by leaning on the cached field
* Delete tests for deleted task
* Revert "Back out changes to callback class not needed after all"
This reverts commit 3b8ae350d218991d42bffd65ce4baac6f41926b2.
* Directly hardcode stats_event_type for callback class
* Fire notifications if stats event was never sent
* Remove test content for deleted methods
* Add placeholder for when no hosts matched
* Make field default be None, denote events processed with empty dict
* Make UI process null value for host_status_counts
* Fix tracking of EOF dispatch for system jobs
* Reorganize EVENT_MAP into class properties
* Consolidate conditional I missed from EVENT_MAP refactor
* Give up on the null condition, also applies for empty hosts
* Remove cls position argument not being used
* Move wrapup method out of class, add tests
* Added schedule_rruleset lookup plugin for awx.awx
* Added DB migration for rrule size
* Updated schedule docs
* The schedule API endpoint will now return an array of errors on rule validation to try and inform the user of all errors instead of just the first
* Remove committed_capacity field, delete supporting code
* Track consumed capacity to solve the negatives problem
* Use more verbose name for IG queryset
* move static methods used by task manager
These static methods were being used to act on Instance-like objects
that were SimpleNamespace objects with the necessary attributes.
This change introduces dedicated classes to replace the SimpleNamespace
objects and moves the formerlly staticmethods to a place where they are
more relevant instead of tacked onto models to which they were only
loosly related.
Accept in-memory data structure in init methods for tests
* initialize remaining capacity AFTER we built map of instances
* create a singular page with listed replies that can be copy and pasted for mailing list and bug scrub purposes
Co-authored-by: Alicia Cozine <879121+acozine@users.noreply.github.com>
I verified what Seth found in https://github.com/ansible/awx/pull/12052, but would really hate to lose this functionality. Curious if folks on the API team can try this and see if it works for them.
By using .only we select fewer columns, avoiding potentially large
fields that we never reference.
Also, small tweak to eliminate what was a duplicate dictionary of
hostname:instance, because we don't need build and carry two copies of
the same data.
Update when deleted is show on job details.
Some job types should not display inventory or projects, update when
showing those fields.
Also, update when displaying information when
those fields where deleted.
See: https://github.com/ansible/awx/issues/12008
and then switch from using order by ID as a fallback for all ordering and instead
just set instances ordering to ID as default to prevent
OrderedManyToMany fields ordering from being interrupted.
Escape name__regex and name__iregex. Escaping the value for those
keys when creating a smart inventory is a work around for the
pyparsing code on the API side for special characters. This will just
display an extra escape when showing the host_filter on details page.
* use new children-summary endpoint data to traverse job event tree
* update job output tests for new children summary data
* force flat mode if event child summary fails to load
* update childrenSummary data for endpoint changes
* don't add jobs to job tree until children summary loaded
* force job output into flat mode if job processing not complete
- returns a special view to output the total number of children (and
grandchildren) events for all parents events for a job
value is the number of total children of that event
- intended to be consumed by the UI, as an efficient way to get the
number of children for a particular event
- see api/templates/api/job_job_events_children_summary.md for more info
* Grafana notifications: Fix panel/dashboardId type
Latest grafana fails with
Error sending notification grafana: 400
[{"classification":"DeserializationError",
"message":"json: cannot unmarshal string into Go struct
field PostAnnotationsCmd.dashboardId of type int64"}]
So ensure the IDs are really int and not strings.
* Fix the dashboard/panelId=0 case
0 is avlaid valid for the ID's, so ensure to allow them.
* Update tests to new behavior
Panel/Dashboard Id fields are not sent if they where not requested.
Alos add tests for the ID=0 case.
* Simple patches to make jobs robust to database restarts
* Add some wait time before retrying loop due to DB error
* Apply dispatcher downtime setting to job updates, fix dispatcher bug
This resolves a bug where the pg_is_down property
never had the right value
the loop is normally stuck in the conn.events() iterator
so it never recognized successful database interactions
this lead to serial database outages terminating jobs
New setting for allowable PG downtime is shared with task code
any calls to update_model will use _max_attempts parameter
to make it align with the patience time that the dispatcher
respects when consuming new events
* To avoid restart loops, handle DB errors on startup with prejudice
* If reconnect consistently fails, exit with non-zero code
* Update runtime.yml
* Extending test_completness to include meta/runtime.yml and adding remaining missing modules from runtime.yml
Co-authored-by: quasd <quasd@users.noreply.github.com>
There was a race condition because the callback reciever tried to run this code:
File "/awx_devel/awx/main/management/commands/run_callback_receiver.py", line 31, in handle
CallbackBrokerWorker(),
File "/awx_devel/awx/main/dispatch/worker/callback.py", line 49, in __init__
self.subsystem_metrics = s_metrics.Metrics(auto_pipe_execute=False)
File "/awx_devel/awx/main/analytics/subsystem_metrics.py", line 156, in __init__
self.instance_name = Instance.objects.me().hostname
Before get_or_register was being called by the dispatcher.
Occasionally the create_partition will error with,
relation "main_projectupdateevent_20220323_19" already exists
This change wraps the db command into a try except block with its
own transaction
This JSONBlob field type is a wrapper around Django's new generic
JSONField, but with the database column type forced to be text. This
should behave close enough to our old wrapper around
django-jsonfield's JSONField and will avoid needing to do the
out-of-band database migration.
* We trigger notifications when the callback receiver processes the
playbook_on_stats event. This is the last event in ansible-playbook and
the process should exist very shortly after this event is emitted. The
trouble comes in with the isolated node feature. There is a management
playbook that runs periodically that pulls the events from the remote
node. It's possible that the management playbooks runs, gets the
playbook_on_stats event, but does not see that the playbook is finished
running. Therefore the job status is still seen as 'running' BUT we have
kicked of the notification for the job. The notification worker will
enter a loop waiting on the job to enter the finished state. In this
case the time it takes for the job to enter the finished state can be
long, roughly 2 * the management playbook run time.
* This new setting allows the user to increase the time that the
notification spends waiting for the job to enter the finished state.
Add related job templates to a couple of screens. Credential and
Inventory.
Also refactor the component already in place for Projects to be in sync
with the Job Templates screen.
See: https://github.com/ansible/awx/issues/5867
Add several changes to API and UI related to Instance Groups.
* Update summary_fields for DEFAULT_CONTROL_PLANE_QUEUE_NAME, and
DEFAULT_EXECUTION_QUEUE_NAME. Rely on API validation for those fields.
* Fix Instance Group list RBAC
* Add validation for a couple of fields on the Instance Groups endpoint
1. is_container_group
2. policy_instance_percentage
3. policy_instance_list
See: https://github.com/ansible/awx/issues/11130
Also: https://github.com/ansible/awx/issues/11718
This will hopefully get us past the unfortunate check against the
HostMetric table, which doesn't exist when you are upgrading from 3.8
to 4.x.
Additionally, guard against AUTH_LDAP_GROUP_TYPE not being in settings
for conf migration 0006.
- the default auto-increment primary key field type is now
configurable, and Django's check command issues a warning if you are
just assuming the historical behavior of using AutoField.
- Django 3.2 brings in automatic AppConfig discovery, so all of our
explicit `default_app_config = ...` assignments in __init__.py
modules are no longer needed, and raise a RemovedInDjango41Warning.
- upgrades
- Django 3.2.12
- pytz 2021.3 (from 2019.3)
- oauthlib 3.2.0 (from 3.1.0)
- requests-oauthlib 1.3.1 (from 1.3.0)
- django-guid 3.2.1 (from 2.2.1)
- django-solo 2.0.0 (from 1.1.3)
- django-taggit 2.1.0 (from 1.2.0)
- netaddr 0.8.0 (from 0.7.19)
- pyrad 2.4 (from 2.3)
- django-radius devel (from 1.3.3)
- future devel (from 0.16.0)
- django-guid, django-solo, and django-taggit are upgraded to fix the
AppConfig deprecation warning. FIXME: django-guid devel has the
fix, but it hasn't been released yet.
- Released versions of django-radius have a hard-coded pin to
future==0.16.0, which has a Python warning due to an improperly
escaped character. This is fixed in future devel, so for now we are
pinning to references to the git repos.
- netaddr had a bunch of Python syntax and deprecation warnings
* Process unresponsive and newly responsive hop nodes
* Use more natural way to zero hop node capacity, add test
* Use warning as opposed to warn for log messages
It previously depended on a private Django internal class that changed
with Django 3.1.
I've switched here instead to disabling the django-polymorphic
accessors to get the underlying UnifiedJob object for a Job, which due
to the way they implement those was resulting in N+1 behavior on
deletes. This gets us back most of the way to the performance gains
we achieved with the custom collector class. See
https://github.com/django-polymorphic/django-polymorphic/issues/198.
- FieldDoesNotExist now has to be imported from django.core.exceptions
- Django docs specifically say not to import
django.conf.global_settings, which now has the side-effect of
triggering one of the check errors
The event_data field on event models, however, is getting an
overridden version that retains the underlying text data type for the
column, to avoid a heavy data migration on those tables.
Also, certain of the larger tables are getting these fields with the
NOT NULL constraint turned off, to avoid a long migration.
Remove the django.utils.six monkey patch we did at the beginning of
the upgrade.
- inspect.getargspec() -> inspect.getfullargspec()
- register pytest.mark.fixture_args
- replace use of DRF's deprecated NullBooleanField
- fix some usage of naive datetimes in the tests
- fix some strings with backslashes that ought to be raw strings
- Django's PostgreSQL JSONField wraps values in a JsonAdapter, so deal
with that when it happens. This goes away in Django 3.1.
- Setting related *_id fields clears the actual relation field, so
trying to fake objects for tests is a problem
- Instance.objects.me() was inappropriately creating stub objects
every time while running tests, but some of our tests now create
real db objects. Ditch that logic and use a proper fixture where needed.
- awxkit tox.ini was pinned at Python 3.8
- upgrades
- Django 3.0.14
- django-jsonfield 1.4.1 (from 1.2.0)
- django-oauth-toolkit 1.4.1 (from 1.1.3)
- Stopping here because later versions have changes to the
underlying model to support OpenID Connect. Presumably this can
be dealt with via a migration in our project.
- django-guid 2.2.1 (from 2.2.0)
- django-debug-toolbar 3.2.4 (from 1.11.1)
- python3-saml 1.13.0 (from 1.9.0)
- xmlsec 1.3.12 (from 1.3.3)
- Remove our project's use of django.utils.six in favor of directly
using six, in awx.sso.fields.
- Temporarily monkey patch six back in as django.utils.six, since
django-jsonfield makes use of that import, and is no longer being
updated. Hopefully we can do away with this dependency with the new
generalized JSONField brought in with Django 3.1.
- Force a json decoder to be used with all instances of JSONField
brought in by django-jsonfield. This deals with the 'cast to text'
problem noted previously in our UPGRADE_BLOCKERS.
- Remove the validate_uris validator from the OAuth2Application in
migration 0025, per the UPGRADE_BLOCKERS, and remove that note.
- Update the TEMPLATES setting to satisfy Django Debug Toolbar. It
requires at least one entry that has APP_DIRS=True, and as near as I
can tell our custom OPTIONS.loaders setting was effectively doing
the same thing as Django's own machinery if this setting is set.
The Member role can derive from e.g. the Org Admin role, so basically
all organization and team roles should be assigned first, so that RBAC
conditions are met when assigning later roles.
This is to avoid references to settings in threads,
this is known to create problems when caches expire
this leads to KeyError in environments with heavy load
* Fix integer/float errors in survey
* Add SURVEY_TYPE_MAPPING to constants
Add SURVEY_TYPE_MAPPING to constants, and replace usage in a couple of
files.
Co-authored-by: Alexander Komarov <akomarov.me@gmail.com>
* Changing session cookie name and added a way for clients to know what the key name is
* Adding session information to docs
* Fixing how awxkit gets the session id header
Right now, without this, we end up with a different number for max_workers than max_forks. For example, on a control node with 16 Gi of RAM,
max_mem_capacity w/ 100 MB/fork = (16*1024)/100 --> 164
max_workers = 5 * 16 --> 80
This means we would allow that control node to control up to 164 jobs, but all jobs after the 80th job will be stuck in `waiting` waiting for a dispatch worker to free up to run the job.
Sharing the /etc/redhat-access-insights is no longer
required for EEs. Furthermore, this fixes a SELinux issue
when launching multiple jobs with concurrency and fact_caching enabled.
i.e:
lsetxattr /etc/redhat-access-insights: operation not permitted
fix memory and cpu settings to suport k8s resource request format
* fix conversion of memory setting to bytes
This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the memory in the same way as we set it
in the operator, as well as with other changes from last year which
assume that ansible runner is returning memory in bytes.
This way we can start setting this setting in the operator, and get a
more accurate reflection of how much memory is available to the control
pod in k8s.
On platforms where services are all sharing memory, we deduct a
penalty from the memory available. On k8s we don't need to do this
because the web, redis, and task containers each have memory
allocated to them.
* Support CPU setting expressed in units used by k8s
This setting has not been getting set by default, and needed some fixing
up to be compatible with setting the CPU resource request/limits in the
same way as we set it in the resource requests/limits.
This way we can start setting this setting in the
operator, and get a more accurate reflection of how much cpu is
available to the control pod in k8s.
Because cpu on k8s can be partial cores, migrate cpu field to decimal.
k8s does not allow granularity of less than 100m (equivalent to 0.1 cores), so only
store up to 1 decimal place.
fix analytics to deal with decimal cpu
need to use DjangoJSONEncoder when Decimal fields in data passed to
json.dumps
There is no current need or use to keep a seperate dependency graph for
each instance group. In the interest of making it clearer what the
current code does, eliminate this superfluous complication.
We are no longer ever referencing any accounting of instance group
capacity, instead we only look
at capacity on intances.
* Select control node before start task
Consume capacity on control nodes for controlling tasks and consider
remainging capacity on control nodes before selecting them.
This depends on the requirement that control and hybrid nodes should all
be in the instance group named 'controlplane'. Many tests do not satisfy that
requirement. I'll update the tests in another commit.
* update tests to use controlplane
We don't start any tasks if we don't have a controlplane instance group
Due to updates to fixtures, update tests to set node type and capacity
explicitly so they get expected result.
* Fixes for accounting of control capacity consumed
Update method is used to account for currently consumed capacity for
instance groups in the in-memory capacity tracking data structure we initialize in
after_lock_init and then update via calculate_capacity_consumed (both in
task_manager.py)
Also update fit_task_to_instance to consider control impact on instances
Trust that these functions do the right thing looking for a
node with capacity, and cut out redundant check for the whole group's
capacity per Alan's reccomendation.
* Refactor now redundant code
Deal with control type tasks before we loop over the preferred instance
groups, which cuts out the need for some redundant logic.
Also, fix a bug where I was missing assigning the execution node in one case!
* set job explanation on tasks that need capacity
move the job explanation for jobs that need capacity to a function
so we can re-use it in the three places we need it.
* project updates always run on the controlplane
Instance group ordering makes no sense on project updates because they
always need to run on the control plane.
Also, since hybrid nodes should always run the control processes for the
jobs running on them as execution nodes, account for this when looking for a
execution node.
* fix misleading message
the variables and wording were both misleading, fix to be more accurate
description in the two different cases where this log may be emitted.
* use settings correctly
use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded
name
cache the controlplane_ig object during the after lock init to avoid
an uneccesary query
eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use
only AWX_CONTROL_NODE_TASK_IMPACT
* add test for control capacity consumption
add test to verify that when there are 2 jobs and only capacity for one
that one will move into waiting and the other stays in pending
* add test for hybrid node capacity consumption
assert that the hybrid node is used for both control and execution and
capacity is deducted correctly
* add test for task.capacity_type = control
Test that control type tasks have the right capacity consumed and
get assigned to the right instance group
Also fix lint in the tests
* jobs_running not accurate for control nodes
We can either NOT use "idle instances" for control nodes, or we need
to update the jobs_running property on the Instance model to count
jobs where the node is the controller_node.
I didn't do that because it may be an expensive query, and it would be
hard to make it match with jobs_running on the InstanceGroup which
filters on tasks assigned to the instance group.
This change chooses to stop considering "idle" control nodes an option,
since we can't acurrately identify them.
The way things are without any change, is we are continuing to over consume capacity on control nodes
because this method sees all control nodes as "idle" at the beginning
of the task manager run, and then only counts jobs started in that run
in the in-memory tracking. So jobs which last over a number of task
manager runs build up consuming capacity, which is accurately reported
via Instance.consumed_capacity
* Reduce default task impact for control nodes
This is something we can experiment with as far as what users
want at install time, but start with just 1 for now.
* update capacity docs
Describe usage of the new setting and the concept of control impact.
Co-authored-by: Alan Rominger <arominge@redhat.com>
Co-authored-by: Rebeccah <rhunter@redhat.com>
Modify usage of ansible_facts on advanced search, once `ansible_facts`
key is selected render a text input allowing the user to type special
query expected for ansible_facts.
This change will add more flexibility to the usage of ansible_facts when
creating a smart inventory.
See: https://github.com/ansible/awx/issues/11017
--- Removed all callback functions from 'jobs.py' and put them in a new file '/awx/main/tasks/callback.py'
--- Modified Unit tests unit moved
--- Moved 'update_model' from jobs.py to /awx/main/utils/update_model.py
Bump react scripts to 5.0
See: https://github.com/ansible/awx/issues/11543
Bump eslint
Bump eslint and related plugins
Add @babe/core
Add @babe/core remove babel/core.
Rename .eslintrc to .eslintrc.json
Rename .eslintrc to .eslintrc.json
Add extra plugin
Move babe-plugin-macro as dev dependencies
Move babe-plugin-macro as dev dependencies
Add preset-react
Add preset-react
Fixing lint errors
Fixing lint errors
Run eslint --fix
Run eslint --fix
Turn no-restricted-exports off
Turn no-restricted-exports off
Revert "Run eslint --fix"
This reverts commit e760885b6c199f2ca18091088cb79bfa77c1d3ed.
Run --fix
Run --fix
Fix lint errors
Also bump specificity of Select CSS border component to avoid bug of
missing borders.
Also update API tests related to lincenses.
Modifying workflows to install python for make commands
Squashing CI tasks to remove repeated steps
Modifying pre-commit.sh to not fail if there are no python file changes
The most notable change here is the removal of the conditional in
process_request. I don't know why we were preferring REQUEST_URI over
PATH_INFO. When the app is running at /, they are always the same as far as I
can tell. However, when using SCRIPT_NAME, this was incorrectly setting path and
path_info to /myprefix/myprefix/.
* Update saml.md
- Updated link to python documentation
- Added instructions for superadmin permissions
Co-authored-by: John Westcott IV <john.westcott.iv@redhat.com>
vars in ansible/instantiate-awx-deployment.yml in awx-operator repo appear to have been updated, because when we used the `tower_...` vars, they did not apply
It has no way of knowing whether a later command will fix the
situation, and this will come up in the installer. Let's just trust
the pre-flight checks.
Django ORM method get_or_create() does not call save() directly,
but it calls the create() [1].
The create method ignores the skip_update=True option, which then
will trigger a project update, however the EE was not yet created
in the database.
To avoid this problem, we just check the existence of the default
project and creates it with save(skip_update=True) manually.
* update StatusLabels on job detail
* change StatusIcon to use PF circle icons
* change status icon to status label on host event modal
* update status label on wf job output
* update tests for status label changes
* fix default status icon color
Extend the timeout, assuming that we want to let the kubernetes scheduler
start containers when it wants to start them. This allows us to make
resource requests knowing that when some jobs queue up waiting for
resources, they will not get reaped in as short of a
timeout.
This requires corresponding ansible-runner changes
which are only available in devel branch
to do this, requirements are changed
to install ansible-runner devel as it did before
Revert "Use ansible-runner 2.1.1 build"
This reverts commit f0ede01017.
Add back in change from updater.sh that we want to keep
--- Added 3 new sub-package : awx.main.tasks.system , awx.main.tasks.jobs , awx.main.tasks.receptor
--- Modified the functional tests and unit tests accordingly
* Adding SAML option in SAML configuration to specify system auditor and system superusers by role or attribute
* Adding keycloak container and documentation on how to start keycloak alongside AWX (including configuration of both)
Events were passed to `handleRelaunch` and those events structure were
not parseable to JSON - breaking the relaunch of jobs. React 17 changes
made this bug visible.
Also, remove withRouter from LaunchButton.
See: https://github.com/ansible/awx/issues/11452
- the list, detail, and health check API views should not include them
- the Instance-InstanceGroup association views should not allow them
to be changed
- the ping view excludes them
- list_instances management command excludes them
- Instance.set_capacity_value sets hop nodes to 0 capacity
- TaskManager will exclude them from the nodes available for job execution
- TaskManager.reap_jobs_from_orphaned_instances will consider hop nodes
to be an orphaned instance
- The apply_cluster_membership_policies task will not manipulate hop nodes
- get_broadcast_hosts will ignore hop nodes
- active_count also will ignore hop nodes
This will allow us to control the default container group created via settings, meaning
we could set this in the operator and the default container group would get created with it applied.
We need this for https://github.com/ansible/awx-operator/issues/242
Deepmerge the default podspec and the override
With out this, providing the `spec` for the podspec would override everything
contained, which ends up including the container used, which is not desired
Also, use the same deepmerge function def, as the code seems to be copypasted from
the utils
* Correct syntax errors & add back lost last line for messages.po
* Manually sort through es & nl translated strings
* Mnaually sort through french strings and correct syntax errors
Signed-off-by: Christian M. Adams <chadams@redhat.com>
Load ansible.cfg from the branch specified on job template (i.e. the same branch that the playbook exists), not from the branch set in the "project".
Signed-off-by: notok <noto.kazufumi@gmail.com>
* Allow for customizing the receptor image
* Hook in receptor image to docker-compose template
* Fix missing -e to pass into Dockerfile playbook
* Add some docs
Make sure to use a different session cookie name then the default, to
avoid overlapping cookies with other django apps that might be running.
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
I broke everything in https://github.com/ansible/awx/pull/11242.
These changes were necessary in order to run `awx-manage collectstatic` without a running database.
* Note in the job lifecycle when the controller_node and execution_node
are chosen. This event occurs most commonly in the task manager with a
couple of exceptions that happen when we dynamically create dependenct
jobs on the fly in tasks.py
Allows the user to visit login page when the login redirect url is set.
Also, redirects to login page once logging out and there is session from
a SAML available.
See: https://github.com/ansible/awx/issues/11012
* Unsure exactly why this happens but there seems to be a race condition
related to the time window between Receptor work_results and work
release. This sleep extends that window and hopefully avoids the race
condition.
Add new logic to cleanup orphaned work units
from administrative tasks
Remove noisy log which is often irrelevant
about running-cleanup-on-execution-nodes
we already have other logs for this
Skip checking the health of a mesh instance when the instance is not registered
with the application. This prevents encountering an 'UnbouncLocalError' when
running the application attached to a multi-use Receptor mesh network
Signed-off-by: Ethan Paul <24588726+enpaul@users.noreply.github.com>
For running `make docker-compose` a working version of openssl is
required for successfully generating Private RSA key for signing work.
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
Use the slack_sdk instead of the deprecated slackclient. Because according to the official documentation:
> The slackclient PyPI project is in maintenance mode now and slack-sdk project is the successor.
With this commit one UPGRADE BLOCKER from requirements/requirements.in is removed. Als the license for slack_sdk
is updated and unit tests for slack notifications backend are added.
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
The eslint and jsconfig files are needed to start the dev server.
Without the jsconfig, the ui development server can't resolve src
modules and will fail to start.
Primary changes are:
- Generalized variable names (remove "docker")
- Add explicit "push" variable rather than checking if the "registry" variable is defined.
- Allow for passing in version as build arg
In 4.1+ / AAP 2.1+, isolated groups should be converted into plain
instance groups, and it's desirable for the old ones to stick around
since they'll likely be tied to a bunch of job templates. We do not
want to make the users have to reconstruct those relationships.
* Primary development of integrating runner cleanup command
* Fixup image cleanup signals and their tests
* Use alphabetical sort to solve the cluster coordination problem
* Update test to new pattern
* Clarity edits to interface with ansible-runner cleanup method
* Another change corresponding to ansible-runner CLI updates
* Fix incomplete implementation of receptor remote cleanup
* Share receptor utils code between worker_info and cleanup
* Complete task logging from calling runner cleanup command
* Wrap up unit tests and some contract changes that fall out of those
* Fix bug in CLI construction
* Fix queryset filter bug
-- Updated devel build to take most recent receptor binary
-- Added signWork parameter when sedning job to receptor
-- Modified docker-compose tasks to generate RSA key pair to use for work-signing
-- Modified docker-compose templates and jinja templates for implementing work-sign
-- Modified Firewall rules on the receptor jinja config
Add firewall rules to dev env
<!-- Issues are for **concrete, actionable bugs and feature requests** only - if you're just asking for debugging help or technical support, please use:
- Hello, we think your question is answered in our FAQ. Does this: https://www.ansible.com/products/awx-project/faq cover your question?
- You can find the latest documentation here: https://ansible.readthedocs.io/projects/awx/en/latest/userguide/index.html
## PRs/Issues
### Visit the Forum or Matrix
- Hello, this appears to be less of a bug report or feature request and more of a question. Could you please ask this on either the [Ansible AWX channel on Matrix](https://matrix.to/#/#awx:ansible.com) or the [Ansible Community Forum](https://forum.ansible.com/tag/awx)?
### Denied Submission
- Hi! \
\
Thanks very much for your submission to AWX. It means a lot to us that you have taken time to contribute. \
\
At this time we do not want to merge this PR. Our reasons for this are: \
\
(A) INSERT ITEM HERE \
\
Please know that we are always up for discussion but this project is very active. Because of this, we're unlikely to see comments made on closed PRs, and we lock them after some time. If you or anyone else has any further questions, please let us know by using any of the communication methods listed in the page below: \
\
https://github.com/ansible/awx/#get-involved \
\
In the future, sometimes starting a discussion on the development list prior to implementing a feature can make getting things included a little easier, but it is not always necessary. \
\
Thank you once again for this and your interest in AWX!
### No Progress Issue
- Hi! \
\
Thank you very much for for this issue. It means a lot to us that you have taken time to contribute by opening this report. \
\
On this issue, there were comments added but it has been some time since then without response. At this time we are closing this issue. If you get time to address the comments we can reopen the issue if you can contact us by using any of the communication methods listed in the page below: \
\
https://github.com/ansible/awx/#get-involved \
\
Thank you once again for this and your interest in AWX!
### No Progress PR
- Hi! \
\
Thank you very much for your submission to AWX. It means a lot to us that you have taken time to contribute. \
\
On this PR, changes were requested but it has been some time since then. We think this PR has merit but without the requested changes we are unable to merge it. At this time we are closing your PR. If you get time to address the changes you are welcome to open another PR or we can reopen this PR upon request if you contact us by using any of the communication methods listed in the page below: \
\
https://github.com/ansible/awx/#get-involved \
\
Thank you once again for this and your interest in AWX!
### Red Hat Support Team
- Hi! \
\
It appears that you are using an RPM build for RHEL. Please reach out to the Red Hat support team and submit a ticket. \
\
Here is the link to do so: \
\
https://access.redhat.com/support \
\
Thank you for your submission and for supporting AWX!
## Common
### Give us more info
- Hello, we'd love to help, but we need a little more information about the problem you're having. Screenshots, log outputs, or any reproducers would be very helpful.
### Code of Conduct
- Hello. Please keep in mind that Ansible adheres to a Code of Conduct in its community spaces. The spirit of the code of conduct is to be kind, and this is your friendly reminder to be so. Please see the full code of conduct here if you have questions: https://docs.ansible.com/projects/ansible/latest/community/code_of_conduct.html
### EE Contents / Community General
- Hello. The awx-ee contains the collections and dependencies needed for supported AWX features to function. Anything beyond that (like the community.general package) will require you to build your own EE. For information on how to do that, see https://docs.ansible.com/projects/builder/en/stable/ \
\
The Ansible Community is looking at building an EE that corresponds to all of the collections inside the ansible package. That may help you if and when it happens; see https://github.com/ansible-community/community-topics/issues/31 for details.
## Mailing List Triage
### Create an issue
- Hello, thanks for reaching out on list. We think this merits an issue on our GitHub, https://github.com/ansible/awx/issues. If you could open an issue up on GitHub it will get tagged and integrated into our planning and workflow. All future work will be tracked there. Issues should include as much information as possible, including screenshots, log outputs, or any reproducers.
### Create a Pull Request
- Hello, we think your idea is good! Please consider contributing a PR for this following our contributing guidelines: https://github.com/ansible/awx/blob/devel/CONTRIBUTING.md
### Receptor
- You can find the receptor docs here: https://docs.ansible.com/projects/receptor/en/latest/
- Hello, your issue seems related to receptor. Could you please open an issue in the receptor repository? https://github.com/ansible/receptor. Thanks!
### Ansible Engine not AWX
- Hello, your question seems to be about Ansible development, not about AWX. Try asking on in the Forum https://forum.ansible.com/tag/development
- Hello, your question seems to be about using Ansible Core, not about AWX. https://forum.ansible.com/tag/ansible-core is the best place to visit for user questions about Ansible. Thanks!
### Ansible Galaxy not AWX
- Hey there. That sounds like an FAQ question. Did this: https://www.ansible.com/products/awx-project/faq cover your question?
We'd be happy to help if you can reproduce this with AWX since we do not have Oracle's Linux Automation Manager. If you need help with this specific version of Oracles Linux Automation Manager you will need to contact your Oracle for support.
### Community Resolved
Hi,
We are happy to see that it appears a fix has been provided for your issue, so we will go ahead and close this ticket. Please feel free to reopen if any other problems arise.
<name of community member who helped> thanks so much for taking the time to write a thoughtful and helpful response to this issue!
### AWX Release
Subject: Announcing AWX Xa.Ya.za and AWX-Operator Xb.Yb.zb
- Hi all, \
\
We're happy to announce that the next release of AWX, version <b>`Xa.Ya.za`</b> is now available! \
In addition AWX Operator version <b>`Xb.Yb.zb`</b> has also been released! \
Hi there! We're excited to have you as a contributor.
Have questions about this document or anything not covered here? Come chat with us at `#ansible-awx` on irc.libera.chat, or submit your question to the [mailing list](https://groups.google.com/forum/#!forum/awx-project).
Have questions about this document or anything not covered here? Create a topic using the [AWX tag on the Ansible Forum](https://forum.ansible.com/tag/awx).
## Table of contents
@@ -17,19 +17,21 @@ Have questions about this document or anything not covered here? Come chat with
- [Building API Documentation](#building-api-documentation)
- [Accessing the AWX web interface](#accessing-the-awx-web-interface)
- [Purging containers and images](#purging-containers-and-images)
- [Pre commit hooks](#pre-commit-hooks)
- [What should I work on?](#what-should-i-work-on)
- All code submissions are done through pull requests against the `devel` branch.
- You must use `git commit --signoff` for any commit to be merged, and agree that usage of --signoff constitutes agreement with the terms of [DCO 1.1](./DCO_1_1.md).
- Take care to make sure no merge commits are in the submission, and use `git rebase` vs `git merge` for this reason.
- If collaborating with someone else on the same branch, consider using `--force-with-lease` instead of `--force`. This will prevent you from accidentally overwriting commits pushed by someone else. For more information, see https://git-scm.com/docs/git-push#git-push---force-with-leaseltrefnamegt
- If submitting a large code change, it's a good idea to join the `#ansible-awx` channel on irc.libera.chat, and talk about what you would like to do or add first. This not only helps everyone know what's going on, it also helps save time and effort, if the community decides some changes are needed.
- We ask all of our community members and contributors to adhere to the [Ansible code of conduct](http://docs.ansible.com/ansible/latest/community/code_of_conduct.html). If you have questions, or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
- If collaborating with someone else on the same branch, consider using `--force-with-lease` instead of `--force`. This will prevent you from accidentally overwriting commits pushed by someone else. For more information, see [git push docs](https://git-scm.com/docs/git-push#git-push---force-with-leaseltrefnamegt).
- If submitting a large code change, it's a good idea to create a [forum topic tagged with 'awx'](https://forum.ansible.com/tag/awx), and talk about what you would like to do or add first. This not only helps everyone know what's going on, it also helps save time and effort, if the community decides some changes are needed.
- We ask all of our community members and contributors to adhere to the [Ansible code of conduct](https://docs.ansible.com/projects/ansible/latest/community/code_of_conduct.html). If you have questions, or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
## Setting up your development environment
@@ -41,8 +43,7 @@ The AWX development environment workflow and toolchain uses Docker and the docke
Prior to starting the development services, you'll need `docker` and `docker-compose`. On Linux, you can generally find these in your distro's packaging, but you may find that Docker themselves maintain a separate repo that tracks more closely to the latest releases.
For macOS and Windows, we recommend [Docker for Mac](https://www.docker.com/docker-mac) and [Docker for Windows](https://www.docker.com/docker-windows)
respectively.
For macOS and Windows, we recommend [Docker for Mac](https://www.docker.com/docker-mac) and [Docker for Windows](https://www.docker.com/docker-windows) respectively.
For Linux platforms, refer to the following from Docker:
@@ -66,7 +67,7 @@ If you're not using Docker for Mac, or Docker for Windows, you may need, or choo
#### Frontend Development
See [the ui development documentation](awx/ui/CONTRIBUTING.md).
See [the ansible-ui development documentation](https://github.com/ansible/ansible-ui/blob/main/CONTRIBUTING.md).
#### Fork and clone the AWX repo
@@ -78,17 +79,13 @@ See the [README.md](./tools/docker-compose/README.md) for docs on how to build t
### Building API Documentation
AWX includes support for building [Swagger/OpenAPI
documentation](https://swagger.io). To build the documentation locally, run:
AWX includes support for building [Swagger/OpenAPI documentation](https://swagger.io). To build the documentation locally, run:
```bash
(container)/awx_devel$ make swagger
```
This will write a file named `swagger.json` that contains the API specification
in OpenAPI format. A variety of online tools are available for translating
this data into more consumable formats (such as HTML). http://editor.swagger.io
is an example of one such service.
This will write a file named `swagger.json` that contains the API specification in OpenAPI format. A variety of online tools are available for translating this data into more consumable formats (such as HTML). http://editor.swagger.io is an example of one such service.
### Accessing the AWX web interface
@@ -104,21 +101,39 @@ When necessary, remove any AWX containers and images by running the following:
(host)$ make docker-clean
```
### Pre commit hooks
When you attempt to perform a `git commit` there will be a pre-commit hook that gets run before the commit is allowed to your local repository. For example, python's [black](https://pypi.org/project/black/) will be run to test the formatting of any python files.
While you can use environment variables to skip the pre-commit hooks GitHub will run similar tests and prevent merging of PRs if the tests do not pass.
If you would like to add additional commit hooks for your own usage you can create a directory in the root of the repository called `pre-commit-user`. Any executable file in that directory will be executed as part of the pre-commit hooks. If any of the pre-commit checks fail the commit will be halted. For your convenience in user scripts, a variable called `CHANGED_FILES` will be set with any changed files present in the commit.
## What should I work on?
We have a ["good first issue" label](https://github.com/ansible/awx/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) we put on some issues that might be a good starting point for new contributors.
Fixing bugs and updating the documentation are always appreciated, so reviewing the backlog of issues is always a good place to start.
For feature work, take a look at the current [Enhancements](https://github.com/ansible/awx/issues?q=is%3Aissue+is%3Aopen+label%3Atype%3Aenhancement).
If it has someone assigned to it then that person is the person responsible for working the enhancement. If you feel like you could contribute then reach out to that person.
Fixing bugs, adding translations, and updating the documentation are always appreciated, so reviewing the backlog of issues is always a good place to start. For extra information on debugging tools, see [Debugging](https://github.com/ansible/awx/blob/devel/docs/debugging.md).
**NOTES**
**NOTE**
> Issue assignment will only be done for maintainers of the project. If you decide to work on an issue, please feel free to add a comment in the issue to let others know that you are working on it; but know that we will accept the first pull request from whomever is able to fix an issue. Once your PR is accepted we can add you as an assignee to an issue upon request.
> If you work in a part of the codebase that is going through active development, your changes may be rejected, or you may be asked to `rebase`. A good idea before starting work is to have a discussion with us in the `#ansible-awx` channel on irc.libera.chat, or on the [mailing list](https://groups.google.com/forum/#!forum/awx-project).
**NOTE**
> If you work in a part of the codebase that is going through active development, your changes may be rejected, or you may be asked to `rebase`. A good idea before starting work is to have a discussion with us in the [Ansible Forum](https://forum.ansible.com/tag/awx).
> If you're planning to develop features or fixes for the UI, please review the [UI Developer doc](https://github.com/ansible/ansible-ui/blob/main/CONTRIBUTING.md).
### Translations
At this time we do not accept PRs for adding additional language translations as we have an automated process for generating our translations. This is because translations require constant care as new strings are added and changed in the code base. Because of this the .po files are overwritten during every translation release cycle. We also can't support a lot of translations on AWX as its an open source project and each language adds time and cost to maintain. If you would like to see AWX translated into a new language please create an issue and ask others you know to upvote the issue. Our translation team will review the needs of the community and see what they can do around supporting additional language.
If you find an issue with an existing translation, please see the [Reporting Issues](#reporting-issues) section to open an issue and our translation team will work with you on a resolution.
> If you're planning to develop features or fixes for the UI, please review the [UI Developer doc](./awx/ui/README.md).
## Submitting Pull Requests
@@ -128,43 +143,27 @@ Here are a few things you can do to help the visibility of your change, and incr
- No issues when running linters/code checkers
- Python: black: `(container)/awx_devel$ make black`
- Javascript: `(container)/awx_devel$ make ui-lint`
- No issues from unit tests
- Python: py.test: `(container)/awx_devel$ make test`
- JavaScript: `(container)/awx_devel$ make ui-test`
- Write tests for new functionality, update/add tests for bug fixes
- Make the smallest change possible
- Write good commit messages. See [How to write a Git commit message](https://chris.beams.io/posts/git-commit/).
It's generally a good idea to discuss features with us first by engaging us in the `#ansible-awx` channel on irc.libera.chat, or on the [mailing list](https://groups.google.com/forum/#!forum/awx-project).
It's generally a good idea to discuss features with us first by engaging on the [Ansible Forum](https://forum.ansible.com/tag/awx).
We like to keep our commit history clean, and will require resubmission of pull requests that contain merge commits. Use `git pull --rebase`, rather than
`git pull`, and `git rebase`, rather than `git merge`.
Sometimes it might take us a while to fully review your PR. We try to keep the `devel` branch in good working order, and so we review requests carefully. Please be patient.
All submitted PRs will have the linter and unit tests run against them via Zuul, and the status reported in the PR.
## PR Checks run by Zuul
Zuul jobs for awx are defined in the [zuul-jobs](https://github.com/ansible/zuul-jobs) repo.
Zuul runs the following checks that must pass:
1.`tox-awx-api-lint`
2.`tox-awx-ui-lint`
3.`tox-awx-api`
4.`tox-awx-ui`
5.`tox-awx-swagger`
Zuul runs the following checks that are non-voting (can not pass but serve to inform PR reviewers):
1.`tox-awx-detect-schema-change`
This check generates the schema and diffs it against a reference copy of the `devel` version of the schema.
Reviewers should inspect the `job-output.txt.gz` related to the check if their is a failure (grep for `diff -u -b` to find beginning of diff).
If the schema change is expected and makes sense in relation to the changes made by the PR, then you are good to go!
If not, the schema changes should be fixed, but this decision must be enforced by reviewers.
When your PR is initially submitted the checks will not be run until a maintainer allows them to be. Once a maintainer has done a quick review of your work the PR will have the linter and unit tests run against them via GitHub Actions, and the status reported in the PR.
## Reporting Issues
We welcome your feedback, and encourage you to file an issue when you run into a problem. But before opening a new issues, we ask that you please view our [Issues guide](./ISSUES.md).
## Getting Help
If you require additional assistance, please submit your question to the [Ansible Forum](https://forum.ansible.com/tag/awx).
For extra information on debugging tools, see [Debugging](./docs/debugging/).
Early versions of AWX did not support seamless upgrades between major versions and required the use of a backup and restore tool to perform upgrades.
Users who wish to upgrade modern AWX installations should follow the instructions at:
As of version 18.0, `awx-operator` is the preferred install/upgrade method. Users who wish to upgrade modern AWX installations should follow the instructions at:
Use the GitHub [issue tracker](https://github.com/ansible/awx/issues) for filing bugs. In order to save time, and help us respond to issues quickly, make sure to fill out as much of the issue template
as possible. Version information, and an accurate reproducing scenario are critical to helping us identify the problem.
Please don't use the issue tracker as a way to ask how to do something. Instead, use the [mailing list](https://groups.google.com/forum/#!forum/awx-project) , and the `#ansible-awx` channel on irc.libera.chat to get help.
Please don't use the issue tracker as a way to ask how to do something. Instead, use the [Ansible Forum](https://forum.ansible.com/tag/awx).
Before opening a new issue, please use the issue search feature to see if what you're experiencing has already been reported. If you have any extra detail to provide, please comment. Otherwise, rather than posting a "me too" comment, please consider giving it a ["thumbs up"](https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comment) to give us an indication of the severity of the problem.
@@ -14,7 +14,7 @@ Before opening a new issue, please use the issue search feature to see if what y
When reporting issues for the UI, we also appreciate having screen shots and any error messages from the web browser's console. It's not unusual for browser extensions
and plugins to cause problems. Reporting those will also help speed up analyzing and resolving UI bugs.
### API and backend issues
### API and backend issues
For the API and backend services, please capture all of the logs that you can from the time the problem occurred.
@@ -31,7 +31,7 @@ If your issue isn't considered high priority, then please be patient as it may t
`state:needs_info` The issue needs more information. This could be more debug output, more specifics out the system such as version information. Any detail that is currently preventing this issue from moving forward. This should be considered a blocked state.
`state:needs_review` The issue/pull request needs to be reviewed by other maintainers and contributors. This is usually used when there is a question out to another maintainer or when a person is less familar with an area of the code base the issue is for.
`state:needs_review` The issue/pull request needs to be reviewed by other maintainers and contributors. This is usually used when there is a question out to another maintainer or when a person is less familiar with an area of the code base the issue is for.
`state:needs_revision` More commonly used on pull requests, this state represents that there are changes that are being waited on.
@@ -80,7 +80,7 @@ If any of those items are missing your pull request will still get the `needs_tr
Currently you can expect awxbot to add common labels such as `state:needs_triage`, `type:bug`, `component:docs`, etc...
These labels are determined by the template data. Please use the template and fill it out as accurately as possible.
The `state:needs_triage` label will will remain on your pull request until a person has looked at it.
The `state:needs_triage` label will remain on your pull request until a person has looked at it.
You can also expect the bot to CC maintainers of specific areas of the code, this will notify them that there is a pull request by placing a comment on the pull request.
The comment will look something like `CC @matburt @wwitzel3 ...`.
# The python path needs to be modified so that the tests can find Ansible within the container
# First we will use anything expility set as PYTHONPATH
# Second we will load any libraries out of the virtualenv (if it's unspecified that should be ok because python should not load out of an empty directory)
@@ -321,110 +431,73 @@ symlink_collection:
mkdir -p ~/.ansible/collections/ansible_collections/$(COLLECTION_NAMESPACE)# in case it does not exist
[](https://github.com/ansible/awx/actions/workflows/ci.yml) [](https://codecov.io/github/ansible/awx) [](https://docs.ansible.com/projects/ansible/latest/community/code_of_conduct.html) [](https://github.com/ansible/awx/blob/devel/LICENSE.md) [](https://forum.ansible.com/tag/awx)
> * [Refactoring AWX into a Pluggable, Service-Oriented Architecture](https://forum.ansible.com/t/refactoring-awx-into-a-pluggable-service-oriented-architecture/7404)
> * [Upcoming changes to AWX Operator installation methods](https://forum.ansible.com/t/upcoming-changes-to-awx-operator-installation-methods/7598)
> * [AWX UI and credential types transitioning to the new pluggable architecture](https://forum.ansible.com/t/awx-ui-and-credential-types-transitioning-to-the-new-pluggable-architecture/8027)
AWX provides a web-based user interface, REST API, and task engine built on top of [Ansible](https://github.com/ansible/ansible). It is one of the upstream projects for [Red Hat Ansible Automation Platform](https://www.ansible.com/products/automation-platform).
To install AWX, please view the [Install guide](./INSTALL.md).
To learn more about using AWX, and Tower, view the [Tower docs site](http://docs.ansible.com/ansible-tower/index.html).
To learn more about using AWX, view the [AWX docs site](https://docs.ansible.com/projects/awx/en/latest/).
The AWX Project Frequently Asked Questions can be found [here](https://www.ansible.com/awx-project-faq).
@@ -18,9 +29,9 @@ Contributing
- Refer to the [Contributing guide](./CONTRIBUTING.md) to get started developing, testing, and building AWX.
- All code submissions are made through pull requests against the `devel` branch.
- All contributors must use git commit --signoff for any commit to be merged and agree that usage of --signoff constitutes agreement with the terms of [DCO 1.1](./DCO_1_1.md)
- All contributors must use `git commit --signoff` for any commit to be merged and agree that usage of `--signoff` constitutes agreement with the terms of [DCO 1.1](./DCO_1_1.md)
- Take care to make sure no merge commits are in the submission, and use `git rebase` vs. `git merge` for this reason.
- If submitting a large code change, it's a good idea to join the `#ansible-awx` channel on web.libera.chat and talk about what you would like to do or add first. This not only helps everyone know what's going on, but it also helps save time and effort if the community decides some changes are needed.
- If submitting a large code change, it's a good idea to join discuss via the [Ansible Forum](https://forum.ansible.com/tag/awx). This helps everyone know what's going on, and it also helps save time and effort if the community decides some changes are needed.
Reporting Issues
----------------
@@ -30,12 +41,11 @@ If you're experiencing a problem that you feel is a bug in AWX or have ideas for
Code of Conduct
---------------
We ask all of our community members and contributors to adhere to the [Ansible code of conduct](http://docs.ansible.com/ansible/latest/community/code_of_conduct.html). If you have questions or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
We require all of our community members and contributors to adhere to the [Ansible code of conduct](https://docs.ansible.com/projects/ansible/latest/community/code_of_conduct.html). If you have questions or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
Get Involved
------------
We welcome your feedback and ideas. Here's how to reach us with feedback and questions:
We welcome your feedback and ideas via the [Ansible Forum](https://forum.ansible.com/tag/awx).
- Join the `#ansible-awx` channel on irc.libera.chat
- Join the [mailing list](https://groups.google.com/forum/#!forum/awx-project)
For a full list of all the ways to talk with the Ansible Community, see the [AWX Communication guide](https://docs.ansible.com/projects/awx/en/latest/contributor/communication.html).
This page lists OAuth 2 utility endpoints used for authorization, token refresh and revoke.
Note endpoints other than `/api/o/authorize/` are not meant to be used in browsers and do not
support HTTP GET. The endpoints here strictly follow
[RFC specs for OAuth2](https://tools.ietf.org/html/rfc6749), so please use that for detailed
reference. Note AWX net location default to `http://localhost:8013` in examples:
## Create Token for an Application using Authorization code grant type
Given an application "AuthCodeApp" of grant type `authorization-code`,
from the client app, the user makes a GET to the Authorize endpoint with
*`response_type`
*`client_id`
*`redirect_uris`
*`scope`
AWX will respond with the authorization `code` and `state`
to the redirect_uri specified in the application. The client application will then make a POST to the
`api/o/token/` endpoint on AWX with
*`code`
*`client_id`
*`client_secret`
*`grant_type`
*`redirect_uri`
AWX will respond with the `access_token`, `token_type`, `refresh_token`, and `expires_in`. For more
information on testing this flow, refer to [django-oauth-toolkit](http://django-oauth-toolkit.readthedocs.io/en/latest/tutorial/tutorial_01.html#test-your-authorization-server).
## Create Token for an Application using Password grant type
Log in is not required for `password` grant type, so a simple `curl` can be used to acquire a personal access token
via `/api/o/token/` with
*`grant_type`: Required to be "password"
*`username`
*`password`
*`client_id`: Associated application must have grant_type "password"
This endpoint allows the client to create multiple hosts and associate them with an inventory. They may do this by providing the inventory ID and a list of json that would normally be provided to create hosts.
This endpoint allows the client to launch multiple UnifiedJobTemplates at a time, along side any launch time parameters that they would normally set at launch time.
*`inventory_update`: ID of the inventory update job that was started.
(integer, read-only)
*`project_update`: ID of the project update job that was started if this inventory source is an SCM source.
(interger, read-only, optional)
(integer, read-only, optional)
Note: All manual inventory sources (source="") will be ignored by the update_inventory_sources endpoint. This endpoint will not update inventory sources for Smart Inventories.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.