Utilizes the `validate_role_assignment` callback
from dab (see dab PR #490) to prevent granting credential
access to a user of another organization.
This logic will work for role_user_assignments
and role_team_assignments endpoints.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Added new OpenShift Virtualization inventory source to docs.
* Incorporated review feedback from @fosterseth and @TheRealHaoLiu.
* Fixed link to correct kubevirt.core.kubevirt documentation.
* Add better 403 error message for Job template create
To create Job template u need access to projects and inventory
---------
Co-authored-by: Chris Meyers <chris.meyers.fsu@gmail.com>
* Add initial test for deletion of stale permission
* Delete existing EE view permission
* Hypothetically complete update of EE model permissions setup
* Tests passing locally
* Issue with user_capabilities was a test bug, fixed
* Add TASK_MANAGER_LOCK_TIMEOUT
`TASK_MANAGER_LOCK_TIMEOUT` controls the `idle_in_transaction_session_timeout` and `idle_session_timeout` configuration for task manager connections and lock in database
hope to prevent the situation that the task instance that holds the lock becomes unresponsive and preventing other instance to be able to run task manager
* Add session timeout to periodic scheduler and all sub task manager locks
Workaround
```
ERROR awx/main/tests/functional/test_licenses.py - pip._vendor.distlib.DistlibException: Unable to locate finder for 'pip._vendor.distlib'
```
* Add migration testing for certain managed roles
* Fix managed role bugs
* Add more tests
* Fix another bug with org workflow admin role reference
* Add test because another issue is fixed
* Mark reason for test
* Remove internal markers
* Reword failure message
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Script was falsely identifying cross-linked
parents. It needs to check if parent roles if
content type is Team and role_field is
member_role OR admin_role.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Include a bit of context into the name of the delete function. The
HTTP_ added prepended string may be unexpected if Django's header
transformation isn't top of mind.
rename AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED
to
ALLOW_LOCAL_RESOURCE_MANAGEMENT
- clearer meaning
- drop prefix so the same setting is used across the platform
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
This is actually happening for one customer, though it seems like it
shouldn't be if the foreign key constraint is set back up properly.
In order to recreate it, I had to add the constraint back with 'NOT
VALID' added on to prevent the check.
* Periodically sync from share resource provider
- add periodic task `periodic_resource_sync` run once every 15 min
- if `RESOURCE_SERVER` is not configured sync will not run
- only 1 node
example RESOURCE_SERVER configuration
```
RESOURCE_SERVER = {
"URL": "<resource server url>",
"SECRET_KEY": "<resource server auth token>",
"VALIDATE_HTTPS": <True/False>,
}
RESOURCE_SERVICE_PATH = <resource_service_path>
```
If more than one schedule for a unified job template
is removed at once, a race condition can arise.
example scenario: delete schedules with ids 7 and 8
- unified job template next_schedule is currently 7
- on delete of schedule 7, update_computed_fields will try to set
next_schedule to 8
- but while this logic is occurring, another transaction
is deleting 8
This leads to a db IntegrityError
The solution here is to call select_for_update() on the
next schedule, so that 8 cannot be deleted until
the transaction for deleting 7 is completed.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
It looks like we can't do upserts currently without dropping to raw
SQL, but if we wrap each batch in a transaction, that should insure
that each is updated with the correct count.
PG_TLS=true make docker-compose
This will add some extra startup commands
for the postgres container to generate a key and
cert to use for postgres connections.
It will also mount in pgssl.conf which has ssl configuration.
This can be useful for debugging issues that only surface
when using ssl postgres connections.
* Prevent modifying shared resources
Adds a class decorator to prevent modifying shared resources
when gateway is being used.
AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED is the setting
to enable/disable this feature.
Works by overriding these view methods:
- create
- delete
- perform_update
create and delete are overridden to raise a
PermissionDenied exception.
perform_update is overridden to check if any shared
fields are being modified, and raise a PermissionDenied
exception if so.
Additional changes:
Prevent sso conf from registering external authentication related settings if
AWX_DIRECT_SHARED_RESOURCE_MANAGEMENT_ENABLED is False
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Support for AWS SNS notifications. SNS is a widespread service that is used to integrate with other AWS services(EG lambdas). This support would unlock use cases like triggering lambda functions, especially when AWX is deployed on EKS.
Decisions:
Data Structure
- I preferred using the same structure as Webhook for message body data because it contains all job details. For now, I directly linked to Webhook to avoid duplication, but I am open to suggestions.
AWS authentication
- To support non-AWS native environments, I added configuration options for AWS secret key, ID, and session tokens. When entered, these values are supplied to the underlining boto3 SNS client. If not entered, it falls back to the default authentication chain to support the native AWS environment. Properly configured EKS pods are created with temporary credentials that the default authentication chain can pick automatically.
---------
Signed-off-by: Ethem Cem Ozkan <ethemcem.ozkan@gmail.com>
* Always output awx logs to a file via otel
* That log file can always be later replayed into a product that
supports otlp at a later date.
* Useful when you find a problem that you need a time series DB to help
find and solve.
* Useful if a community member or customer has a problem where a time
series db would be helpful. You can take a "remote" users log and
replay it locally for analysis.
* Otherwise, settings value changes bleeds over into other tests.
* Remove django.conf settings import so that we do not accidentally
forget to use the settings fixture.
- switch to galaxy search API for determining if the version we want to publish already exist
- switch from github action variable to env var for easier copy and paste testing
We have not identify the root cause of wsrelay failure but attempt to make wsrelay restart itself resulted in postgres and redis connection leak. We were not able to fully identify where the redis connection leak comes from so reverting back to failing and removing startsecs 30 will prevent wsrelay to FATAL
```
ERRO[0000] path "/var/lib/awx/.config" exists and it is not owned by the current user
```
start to happen with podman 5
it seems that the config files are no longer needed removing it fixes the problem
skip update parent logic for 'waiting' on UnifiedJob
by not looking up "status_before" from previous instance
we save 2 to 3 expensive calls (the self lookup of old state, the lookup
of parent, and the update to parent if allow_simultaneous == False or status == 'waiting')
This change makes "wait: true" for jobs and syncs
look at the event_processing_finished instead of
finished field.
Right now there is a race condition where
a module might try to delete an inventory, but the events
for an inventory sync have not yet finished. We have a
RelatedJobsPreventDeleteMixin that checks for this condition.
bulk jobs don't have event_processing_finished so we just
use finished field in that case.
openssl 3.2.0 has incompatiblity issues with
the libpq version we are using, and causes
some C runtime errors:
"double free or corruption (out)"
see awx issue #15136
also this issue
github.com/conan-io/conan-center-index/pull/22615
once the libpq libraries on centos stream9 are
updated with the patch, we can unpin openssl
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If we don't have something in the cache when we call
get_by_natural_key, do an actual filtered query for it and cache the
results. We'll get more overall API calls this way, but they'll be
smaller and will happen while we are importing, not upfront.
* Adding CSRF Validation for schemas
* Changing retrieve of scheme to avoid importing new library
* check if CSRF_TRUSTED_ORIGINS exists before accessing it
---------
Signed-off-by: Bruno Sanchez <brsanche@redhat.com>
Currently the association box displays a
list of available instances/addresses that can
be peered to.
The pagination issue arises as follows:
- fetch 5 instances (based on page_size)
- filter these instances down based on some
criteria (like is_internal: false)
- show results
Filtering down the results inside of the fetch
method results in pagnation errors (may return fewer than
5, for example)
instead, do the filtering by API queries. That way the
pagination count will be correct.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
It's the year 2024: using -k as default in https URL schemes should be deprecated. (I have left one mention of it possibly being required if no CA available). Furtheremore, neither -XGET or -XPOST are required, as curl(1) well knows when to use which method.
- when re-establishing connection to db close old connection
- re-initialize WebSocketRelayManager when restarting asyncio.run
- log and ignore error in cleanup_offline_host (this might come back to bite us)
- cleanup connection when WebSocketRelayManager crash
* Fix bug where team could not be given read_role to other team
* Avoid unwanted triggers of parentage granting
* Restructure signal structure
* Fix another bug unmasked by team member permission fix
* Changes to live with test writing
* Use equality as opposed to string "in"
from Seth in review comment
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
---------
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Adds new modules for CRUD operations on the
following endpoints:
- api/v2/role_definitions
- api/v2/role_user_assignments
- api/v2/role_team_assignments
Note: assignment is Create or Delete only
Additional changes:
- Currently DAB endpoints do not have "type"
field on the resource list items. So this modifies
the create_or_update_if_needed to allow manually
specifying item type.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Add new enablement settings from DAB RBAC
* Initial implementation of system auditor as role without testing
* Fix system auditor role, remove duplicate assignments
* Make the system auditor role managed
* Flake8 fix
* Remove another thing from old solution
* Fix a few test failures
* Add extra setting to disable custom system roles via API
* Add test for custom role prohibition
Develop ability to list permissions for existing roles
Create a model registry for RBAC-tracked models
Write the data migration logic for creating
the preloaded role definitions
Write migration to migrate old Role into ObjectRole model
This loops over the old Role model, knowing it is unique
on object and role_field
Most of the logic is concerned with identifying the
needed permissions, and then corresponding role definition
As needed, object roles are created and users then teams
are assigned
Write re-computation of cache logic for teams
and then for object role permissions
Migrate new RBAC internals to ansible_base
Migrate tests to ansible_base
Implement solution for visible_roles
Expose URLs for DAB RBAC
* Before, the optional url prefix feature required calling our
versioning version of reverse(). This worked _ok_ until we added more
and more urls from 3rd party apps. Those 3rd party apps do not call
our reverse(), writefully so.
* This implementation looks at the incoming request path. If it includes
the special optional prefix url, then we register ALL the urls WITH
the optional url prefix.
If the incoming request path does NOT contain the options url prefix
then we register ALL the urls WITHOUT the optional url prefix.
* Before this, we were registering BOTH sets of urls and then reverse()
+ the request as context to decide which url.
* Middleware classes can be instantiated multiple times in testing. To
make this a non-issue, move the init code for named urls out of the
middleware init and into the app init.
* This makes it easier to use other testing facilities, like
LiveServerTestCase, without having to mock the named url middleware
init.
The promote workflow recently failed. Since this was just a problem with our automation, it would be nice if we didn't have to do another release just to fix our tooling.
* Stage multi-arch awx image
- change CI to use `make awx-kube-build` instead of build playbook
- update staging CI to build and push multiarch awx image
- update doc to use `make awx-kube-build` to build awx image
- remove build playbook (no longer used)
* `drf_reverse()` was introduced here 1a75b1836e
* There is a comment about monkey patching. I can't find the monkey patch it is referencing.
* AWX `drf_reverse()` is a copy paste of this https://github.com/encode/django-rest-framework/blob/master/rest_framework/reverse.py#L32
* The only difference is DRF's version calls `preserve_builtin_query_params()`
* `preserve_builtin_query_params()` only does something if `api_settings.URL_FORMAT_OVERRIDE` is defined.
* We don't use `REST_FRAMEWORK.URL_FORMAT_OVERRIDE`
* Added docs for terraform credential/inventory source
* Updated screen captures for inventories and source to match wfjt example
* Added docs for terraform credential/inventory source
* Updated screen captures for inventories and source to match wfjt example
* Update docs/docsite/rst/userguide/inventories.rst
Co-authored-by: Aoki <lucasaoki@users.noreply.github.com>
* Revised per review feedback.
* Update docs/docsite/rst/userguide/inventories.rst
Co-authored-by: Helen Bailey <hakbailey@gmail.com>
---------
Co-authored-by: Aoki <lucasaoki@users.noreply.github.com>
Co-authored-by: Helen Bailey <hakbailey@gmail.com>
* Adjust the awx-manage script to make use of importlib
removing the deprecation warning.
* Synlink awx-manage in docker-compose
No longer need to rebuild docker-compose devel image to load change for `tools/docker-compose/awx-manage` in development environment
---------
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* Update DOCKER_COMPOSE command
docker-compose will stop being supported soon and this is causing CI flake setting DOCKER_COMPOSE default to `docker compose`
* Give AWX network a static name
* We didn't really make use of json formatting across the app. Remove
the special case json formatter. Instead, output all of the meta-data
associated with a job lifecycle event every time. Before, we tried to
only output this extra meta data when in DEBUG mode. It turns out this
information is smaller than we thought and more useful than we thought
so always output it.
* Previously, the params were passed without quotes and each directory
was being interpreted as a seperate command line flag.
* Added some structure around the error messages returned from
receptorctl so we can more easily decide how to handle each case. For
example, releasing the cleanup job from receptor doesn't absolutely
need to succeed because we have a periodic job that does that. In
fact, that is the thing that is making it fail .. but I digress.
Recent changes in awx and/or django ansible base cause the django
collectstatic command to fail when using an empty settings file.
Instead, use the defaults settings file from controller via
DJANGO_SETTINGS_MODULE=awx.settings.defaults
[linux/amd64 builder 13/13] RUN AWX_SETTINGS_FILE=/dev/null
SKIP_SECRET_KEY_CHECK=yes SKIP_PG_VERSION_CHECK=yes
/var/lib/awx/venv/awx/bin/awx-manage collectstatic --noinput --clear
Traceback (most recent call last):
(...)
django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly
configured. Please supply the ENGINE value. Check settings documentation for
more details.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Fix survey prompt presentation inconsistencies
Remove unnecessary conditional
This conditional always returned true. See the following warning: This condition will always return 'true' since JavaScript compares objects by reference, not value.
Fix schedule edit tests
Modification to settings
- Add hidden to indicate to UI_NEXT to hide the field
- Add warning_text to indicate to UI_NEXT to display the warning when specific setting is modified
- Address some non required field being marked as required
* Add setting for configuring optional URL prefix for /api
Add OPTIONAL_API_URLPATTERN_PREFIX setting
examples:
- if set to `''` (empty string) API pattern will be `/api`
- if set to 'controller' API pattern will be `/api` AND `/api/controller`
* Add dump_auth_config management cmd
- Dump SAML config from AWX to DAB authenticator config in json format
* Add dumping of LDAP settings
* add test for command
* Fix is_enabled
* fix command name typo
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* add fields to config, add name to data
* break out IDP values
* change test fields and value comparison
* edit help text, reformat settings
---------
Co-authored-by: jessicamack <jmack@redhat.com>
https://github.com/ansible/awx/pull/14910/files
introduced a bug where we no longer accept the right exceptions
when 2 job launch at the sametime and try to create jobevent table partition 1 of the job will fail
* Fixed mismatch between setuptools version in the makefile and requirements file
* Fix mismatch of versions in makefile and requirements
* Added maturin license
Prune dangle image periodically
pairs with https://github.com/ansible/ansible-runner/pull/1342
this fix the problem of us forcefully remove images when setting changing ee image that's being used in a job causing the job to fail
* Align Orign and Host header
* Before this change the Host: header was runserver. Seems to be set by
nginx upstream flow.
* After this change we explicitly set the Host: header
* More about CSRF checks ...
CSRF checks that Origin == Host. Think about how the browser works.
<browser goes to awx.com>
"I'm executing javascript that I downloaded from awx.com (ORIGIN) and
I'm making an XHR POST request to awx.com (HOST)"
Server verifies; Host: header == Origin: header; OK!
vs. the malicious case.
<hacker injects javascript code into google.com>
<browser goes to google.com>
"I'm executing javascript that I downloaded from google.com (ORIGIN)
and I'm making an XHR POST request to awx.com (HOST)"
Server verifies; Host: header != Origin: header; NOT OK!
* Update awx/settings/development.py
---------
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Enable VSCode debugger integration when attaching VSCode to with AWX docker-compose development environment container
- add debugpy launch target in `.vscode/launch.json` to enable launching awx processes with debugpy
- add vscode tasks in `.vscode/tasks.json` to facilitate shutting down corresponding supervisord managed processes while launching process with debugpy
- modify nginx conf to add django runserver as fallback to uwsgi (enable launching API server via debugpy)
* Credential Lookup with multiple types
Allow looking up a credential with one of multiple type IDs.
* Allow Azure cred for SCM
Allow selecting an Azure Resource Manager credential for Git-based SCMs.
This is in order to enable using Azure Service Principals for project updates.
* Implement Azure Service Principal Git
This adds support for using an Azure Service Principal for project updates.
---------
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org>
Add pip>=21.3 to dev requirement required for installing django-ansible-base in editable mode
https://peps.python.org/pep-0660/
PEP 660 – Editable installs for pyproject.toml based builds (wheel based)
Not auto-reload explicitly STOPPED processes
In development/debug workflow sometime we explicitly STOP processes this will make sure auto-reload does not start them back up
* add resources api to controller
* update setting
models are not the source of truth in AWX
* Force creation of ServiceID object in tests
* fix typo
* settings fix for CI
---------
Co-authored-by: Alan Rominger <arominge@redhat.com>
Tried to dig as to why we ever needed this and could not find the answer. We removed it and ran all the tests and the tests passed so assuming it's no longer needed.
`pytest awx/main/tests/docs --release=$(VERSION_TARGET)`
where --release is required breaks test discovery and running in vscode (from within the container)
* No harm in adding it to the list. If a JWT auth header is provided,
then process it (valid or not). If a JWT is not provided, move on to
the next auth.
Fix deadlock scenario where dispatcher child process stuck in reading from queue loop after dispatcher parent process decided to quit
Co-authored-by: Alan Rominger <arominge@redhat.com>
- avoid looping
- avoid using multiple files, only one should be provided and processed per type
- use first_found and variables to locate existing file
- skip if no file exists
As we do for control nodes, disable the
install_bundle endpoint for ingress nodes.
This can be done by checking if instance managed
is True.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* authorized/not authorized tests for wsrelay endpoint
* not authorized test for web browser websockets
* skeleton of a test for authorized web browser websockets
The github workflow that we have set up for branch deletion doesn't work:
- the `on: delete` event does not support the `branches:` filter
- the `mode` flag for the aws_s3 module does not have `delete` as one
of the options. The proper option appears to be `delobj`.
* Channels doesn't really give you an interface to support per-endpoint
auth ... so this adds one.
* The web browser and node <--> node communication have different auth
needs.
* Previously, the nginx location would match on /foo/websocket... or
/foo/api/websocket... Now, we require these two paths to start at the
root i.e. <host>/websocket/... /api/websocket/...
* Note: We now also require an ending / and do NOT support
<host>/websocket_foobar but DO support <host>/websocket/foobar. This
was always the intended behavior. We want to keep
<host>/api/websocket/... "open" and routing to daphne in case we want
to add more websocket urls in the future.
* Always support cookies, session, and also allow rest_framework
configured auth methods over the browser websocket.
* The node -> node websocket auth remains locked down and unchanged
* Add a dev option for updater script to pin CI reqs
* Avoid removing git links for dev requirements
* Add dev to primary options
* Fix up sanitize git switch
This is a non-functional change. The way os_info is populated with docker info
and grep 'Operating System' breaks on podman and likely in other places. This
makes it work on both podman and docker, and it will continue to return the
exact same strings everywhere else.
Our migrations that touch roles tend to bring in our real models via
migration_utils.set_current_apps_for_migrations, and that can have
some undesirable side-effects.
* add ldap_auth mount and configure it
* added in key engines, userpass auth method, still needs testing
* add policies and fix ldap_user
* start awx automation for vault demo and move ldap
* update docs with new flags/new credentials
* Added LDAP support for HashiCorp Vault lookup credential
* Added LDAP support for HashiCorp Vault lookup credential
* Replaced graphics and updated missing fields.
* Added LDAP support for HashiCorp Vault lookup credential
* Replaced graphics and updated missing fields.
* Incorporated review feedback from @thedoubl3j and @djyasin.
* Fix UI peers_from_control_nodes
Fixes bug where peers_from_control_node was
greyed out in UI.
Additional changes:
- Make edit instance button only show for instances
with managed = False
- Make remove instance button only show for instances
with managed = False
- InstanceList selectable only for instances with
managed = False
---------
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
* Organize metrics into their respective service
* Server per-service metrics on a per-service http server
* Increase prometheus client usage over our custom metrics fields
Listener Addresses is a better name to
emphasize these are routable addresses to
reach a listener service on the node.
Also removed expand toggle on the listener
addresses list items, as the expanded mode
had no additional information.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Make protocol be blank on instance if there
is no canonical address for this instance.
It was defaulting to "tcp" before.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
In receptor address post-save method:
- Fixed detecting if address was missing
a link from control nodes
- Use InstanceLink create_or_update to prevent
adding duplicate InstanceLink source and target
peers
In instance serializer create_or_update,
delete receptor addresses first before doing
instance create or update. This ensures that we don't
trigger unnecessary post-save methods that might
attempt to manipulate receptor addresses that
will just be removed later.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
test_listener_port
test_peers_from_control_nodes
test_peers_from_control_nodes_without_listener_port
are covered in the following tests:
test_no_op
test_creates_canonical_address
test_deletes_canonical_address
test_updates_canonical_address
test_canonical_address_validation_error
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add functional test case for inspecting
established receptor connections.
InstanceLink starts in ADDING state, and should
move to ESTABLISHED state if the connection
is detected in the receptor status output.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If the port is explicitly set to null (causing any ReceptorAddress to
be deleted), then that's a validation error.
If the port is left off but a ReceptorAddress doesn't already exist,
we should not infer a port number and that is also a validation error.
Adds validation and a unit test to ensure:
- peers_from_control_nodes=True should fail if
listener_port is not set
- peers_from_control_nodes=False should be NOOP if
listener_port is not set
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add validation to prevent any managed node
from modifying "peers" through the API
Peering from these nodes should be handled
by setting peers_from_control_nodes only.
Managed nodes are control nodes and
ingress hop nodes.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
InstanceLink target should not be null.
Should be safe to set to null=False, because we have
a custom RunPython method to explicitly set
target to a proper key.
Also, add new test to test_migrations
which ensures data integrity after migrating
the receptor address model changes.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Adds remove_receptor_address to delete a
receptor address from the database
Also, enforce that only 1 canonical address
can be added to an instance via
the add_receptor_address command.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add forwards method to create a receptor address
for any existing Instance that has listener_port defined
- Add forwards method to modify each InstanceLink object
that changes target to the newly created receptor addresses
This migration was implemented as follows:
1. Add a target_new to InstanceLink which is a foreign key
to ReceptorAddress
2. create receptor addresses
3. link to these receptor addresses using the target_new field
4. rename target_new to target
5. drop listener_port and peers_from_control_nodes from Instance
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
If a Instance endpoint is patched with
{"peers_from_control_nodes" True}
but a listener_port is not defined on the instance,
or is part of the patch payload, do not create a
receptor address.
Only update or create a receptor address if listener_port
is set, either in the payload or already on the instance.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
command has 3 "targets", all of which should be a
ReceptorAddress object
- peers, disconnect, exact
Add logic to make sure each entry in those lists
are receptor addresses.
When creating InstanceLink objects, make sure
target is ReceptorAddress, not an Instance.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
After removing CRUD from receptor addresses, we need
to remove the module.
- remove receptor_address module
- Add listener_port to instance module
- Add peers_from_control_nodes to instance module
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Removes ability to directly create and delete
receptor addresses for a given node.
Instead, receptor addresses are created automatically
if listener_port is set on the Instance.
For example patching "hop" instance
with {"listener_port": 6667}
will create a canonical receptor address with port
6667.
Likewise, peers_from_control_nodes on the instance
sets the peers_from_control_nodes on the canonical
address (if listener port is also set).
protocol is a read-only field that simply reflects
the canonical address protocol.
Other Changes:
- rename k8s_routable to is_internal
- add protocol to ReceptorAddress
- remove peers_from_control_nodes and listener_port
from Instance model
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Remove peer selection on add and edit instance
- Added conconcial name and order comlums names same on endpoints and
peers
- Other cleanup items
- rename name to instance name
Creates a non-deletable address that acts as
the "main" address for this instance.
All other addresses for that instance must
be non-canonical.
When listener_port on an instance is set, automatically
create a canonical receptor address where:
- address is hostname of instance
- port is listener_port
- canonical is True
Additionally, protocol field is added to instance to
denote the receptor listener protocol to use (ws, tcp).
The receptor config listener information is derived from
the listener_port and protocol information. Having a
canonical address that mirrors the listener_port ensures that
an address exists that matches the receptor config information.
Other changes:
- Add managed field to receptor address.
If managed is True, no fields on on this address can be edited
via the API.
If canonical is True, only the address cannot be edited.
- Add managed field to instance. If managed is True, users
cannot set node_state to deprovisioning (i.e. cannot delete node)
This change to our mechanism to prevent users from deleting
the mesh ingress hop node.
- Field is_internal is now renamed to k8s_routable
- Add reverse_peers on instance which is a list of instance IDs
that peer to this instance (via an address)
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Make receptoraddress list views
searchable by "address"
Other changes:
- Add help text to source and target of the
InstanceLink model
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add receptor_address module which allows
users to create addresses for instances
- Update awx_collection functional and integration
tests to support new peering design
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Updated existing tests to support the
ReceptorAddress model
- cannot peer to self
- cannot peer to node that is already peered to me
- cannot peer to node more than once (via 2+ addresses)
- cannot set is_internal True
Other changes:
Change post save signal to only call
schedule_write_receptor_config() when an actual change is detected.
Make functional tests more robust by
checking for specific validation error in the
response.
I.e. instead of just checking for 400, just for 400
and that the error message corresponds to the
validation we are testing for.
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- cannot peer to self
- cannot peer to instance that is already peered to self
Other changes:
- ReceptorAddress protocol field restricted to choices: tcp, ws, wss
- fix awx-manage list_instances when instance.last_seen is None
- InstanceLink make source and target unique together
- Add help text to the ReceptorAddress fields
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- websocket_path can only be set if protocol is ws
- is_internal must be False
- only 1 address per instance can have
peers_from_control_nodes set to True
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
register_peers has inputs:
source: source instance
peers: list of instances the source should peer to
InstanceLink "target" is now expected to be a ReceptorAddress
For each peer, we can just use the first receptor address. If
multiple receptor addresses exist, throw a command error.
Currently this command is only used on VM-deployments, where
there is only a single receptor address per instance, so this
should work fine.
Other changes:
drop listener_port field from Instance. Listener port is now just
"port" on ReceptorAddress
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
group_vars all.yaml changes:
- peer entry has two fields, address and port
- receptor_port is inferred from the first
receptor_address entry that uses protocol tcp
other changes:
ActivityStream now records when receptor_addresses
are peered to
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- write_receptor_config peers to ReceptorAddress entries
that have peers_from_control_nodes enabled
- peers_from_control_nodes and listener_port removed from Instance model
- peers_from_control_nodes added to ReceptorAddress model
- ReceptorAddress is now unique by address and protocol combination
- Write receptor config task is dispatched upon ReceptorAddress creation
or deletion, and when control node is first created
- InstanceLinkSerializer adds a target_address field and has logic
to grab the instance hostname associated with the peered ReceptorAddress
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
Add post save and post delete hooks to
call write_receptor_config when
a receptor address is added / removed.
Add peers_from_control_nodes to
provision_instance
Signed-off-by: Seth Foster <fosterbseth@gmail.com>
- Add database contraints to make sure addresses
are unique
If port is defined:
address, port, protocol, websocket_path are unique together
if port is not defined:
address, protocol, websocket_path are unique together
- Allow deleting address via API
- Add ReceptorAddressAccess to determine permissions
- awx-manage add_receptor_address returns changed: True
if successful
* Not many, if any, folks use the notebook feature. It kind of goes in
and out of popularity. We've used it in the past when we work on
features that require visualization (i.e. network graphs, workflows).
Might as well keep it around in case we use it again.
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Fixed build and image rendering issues.
* Replaced graphic with many many nodes!
* Updated images with latest
* Incorporated review inputs from @fosterseth.
* Incorporated more review feedback from @fosterseth
* Resized graphic
* Resized graphic again
* Reduced image sizes to scale better on different browsers.
* Added section for using private image for default EE.
* Deleted .md file for execution nodes due to migration to RTD.
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Added hop node information and beefed up exec nodes section.
* Made updates to the Topology viewer & dis/associate peers
* Fixed build and image rendering issues.
* Replaced graphic with many many nodes!
* Updated images with latest
* Incorporated review inputs from @fosterseth.
* Incorporated more review feedback from @fosterseth
* Resized graphic
* Resized graphic again
* Reduced image sizes to scale better on different browsers.
* We introduced multi networks to our docker env. The code this replaces
would return both networks' ip addresses concatinated i.e.
'192.168.2.1192.168.2.3'.
* Add username and password to handle_auth and update exception message
Revise naming of ldap username and password
* Add url for LDAP and userpass to method_auth
* Add information regarding LDAP and username and password to credential plugins documentation
Revise ldap_auth to userpass_auth and revised exception to better reflect functionality
* Revise method_auth to ensure certs can be used with username and ensure namespace functionality is not hindered
Every so often we get connection timed out errors towards our HCP Vault
endpoint. This is usually when a larger number of jobs is running
simultaneously. Considering requests for other jobs do still succeed this
is probably load related and adding a retry should help in making this a
bit more robust.
* Put the awx node(s) on a service-mesh docker network so they can be
proxied to. Also put all the other containers on an explicit awx
network otherwise they can not talk to each other. We might could be
more surgical about what containers we put on awx but I just added all
of them.
Add support for receiving webhooks from Bitbucket Data Center, and add support for posting build statuses back
Note that this is very explicitly only for Bitbucket Data Center.
The entire webhook format and API is entirely different for Bitbucket Cloud.
* persist schedule prompt on launch fields when editing
* Merge job template default credentials with schedule overrides in schedule prompt
* rename vars for clarity
* handle undefined defaultCredentials
---------
Co-authored-by: Michael Abashian <mabashia@redhat.com>
AWX only sends Twilio notifications to one destination with the current version of code, but this is a bug. Fixed this bug for sending SMS to multiple destinations.
* Move awxkit import code into a pytest fixture to better control when
the import happens
* Ensure /awx_devel/awxkit is added to sys path before awxkit import
runs
* Basic export tests
* Added test that highlights a problem with running Schedule exports as
non-root user. We rely on the POST key in the OPTIONS response to
determine the fields to export for a resource. The POST key is not
present if a user does NOT have create privileges.
* Fixed up forwarding all headers from the API server back to the test
code. This was causing a problem in awxkit code that checks for
allowed HTTP Verbs in the headers.
* Narrow the scope of RBAC evaluations
* Update tests for RBAC method changes
* Simplify querset for credentials in org
* Fix call pattern to pass in team role obj
Due to https://github.com/ansible/awx/issues/7560
'omhttp' module for rsyslog will completely stop forwarding message to external log aggregator after receiving a 4xx error from the external log aggregator
This PR is an "workaround" for this problem by restarting rsyslogd after detecting that rsyslog received a 4xx error
When making changes to the application sometime you can accidentally cause FATAL state and cause the dev container to crash which will remove any ephemeral changes that you have made and is ANNOYING!
* Adding hosts bulk deletion feature
Signed-off-by: Avi Layani <alayani@redhat.com>
* fix the type of the argument
Signed-off-by: Avi Layani <alayani@redhat.com>
* fixing activity_entry tracking
Signed-off-by: Avi Layani <alayani@redhat.com>
* Revert "fixing activity_entry tracking"
This reverts commit c8eab52c2ccc5abe215d56d1704ba1157e5fbbd0.
Since the bulk_delete is not related to an inventory, only hosts which
can be from different inventories.
* get only needed vars to reduce memory consumption
Signed-off-by: Avi Layani <alayani@redhat.com>
* filtering the data to reduce memory increase the number of queries
Signed-off-by: Avi Layani <alayani@redhat.com>
* update the activity stream for inventories
Signed-off-by: Avi Layani <alayani@redhat.com>
* fix the changes dict initialiazation
Signed-off-by: Avi Layani <alayani@redhat.com>
---------
Signed-off-by: Avi Layani <alayani@redhat.com>
* Add TLS certificate auth for HashiCorp Vault
Add support for AWX to authenticate with HashiCorp Vault using
TLS client certificates.
Also updates the documentation for the HashiCorp Vault secret management
plugins to include both the new TLS options and the missing Kubernetes
auth method options.
Signed-off-by: Andrew Austin <aaustin@redhat.com>
* Refactor docker-compose vault for TLS cert auth
Add TLS configuration to the docker-compose Vault configuration and
use that method by default in vault plumbing.
This ensures that the result of bringing up the docker-compose stack
with vault enabled and running the plumb-vault playbook is a fully
working credential retrieval setup using TLS client cert authentication.
Signed-off-by: Andrew Austin <aaustin@redhat.com>
* Remove incorrect trailing space
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
* Make vault init idempotent
- improve error handling for vault_initialization
- ignore error if vault cert auth is already configured
- removed unused register
* Add VAULT_TLS option
Make TLS for HashiCorp Vault optional and configurable via VAULT_TLS env var
* Add retries for vault init
Sometime it took longer for vault to fully come up and init will fail
---------
Signed-off-by: Andrew Austin <aaustin@redhat.com>
Co-authored-by: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
Co-authored-by: Hao Liu <haoli@redhat.com>
* [CI] Reduce GHA timeouts from 6h default
The goal here is to never interfere with a real run (so most of the
timeout-minutes values seem rather high) but to avoid having 6h long
runs if something goes crazy and never ends.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Do bash hackery instead
Signed-off-by: Rick Elrod <rick@elrod.me>
---------
Signed-off-by: Rick Elrod <rick@elrod.me>
* Fixing wsrelay connection loop
* The loop was being interrupted when reaching the return statements, causing a race condition that would make nodes remain disconnected from their websockets
* Added log messages for the previous return state to improve the logging from this state.
* Added logging for malformed payload
* Update awx/main/wsrelay.py
Co-authored-by: Rick Elrod <rick@elrod.me>
* Moved logmsg outside condition
---------
Co-authored-by: Lucas Benedito <lbenedit@redhat.com>
Co-authored-by: Rick Elrod <rick@elrod.me>
--location (-L) parameter will prompt curl to submit a new request if the URL is a redirect.
After moving to galaxy-NG without -L the curl falsely return 302 for any version
Co-authored-by: John Barker <john@johnrbarker.com>
* allow pytest --migrations to succeed
* We actually subvert migrations from running in test via pytest.ini
--no-migrations option. This has led to bit rot for the sqlite
migrations happy path. This changeset pays off that tech debt and
allows for an sqlite migration happy path.
* This paves the way for programatic invocation of individual migrations
and weaving of the creation of resources (i.e. Instance, Job Template,
etc). With this, a developer can instantiate various database states,
trigger a migration, assert the state of the db, and then have pytest
rollback all of that.
* I will note that in practice, running these migrations is dog shit
slow BUT this work also opens up the possibility of saving and
re-using sqlite3 database files. Normally, caching is not THE answer
and causes more harm than good. But in this case, our migrations are
mostly write-once (I say mostly because this change set violates
that :) so cache invalidation isn't a major issue.
* functional test for migrations on sqlite
* We commonly subvert running migrations in test land. Test land uses
sqlite. By not constantly exercising this code path it atrophies. The
smoke test here is to continuously exercise that code path.
* Add ci test to run migration tests separately, they take =~ 2-3
minutes each on my laptop.
* The smoke tests also serves as an example of how to write migration
tests.
* run migration tests in ci
Currently if you cleanup docker volume for vault and bring docker-compose development back up with vault enabled we will not initialize vault because the secret files still exist.
This change will attempt to initialize vault reguardless and update the secret file if vault is initialized
Adding the possibility to decode base64 decoded strings to Delinea's Devops Secret Vault (DSV).
This is necessary as uploading files to DSV is not possible (and not meant to be) and files should be added base64 encoded.
The commit is making sure to remain backward compatible (no secret decoding), as a default is supplied.
This has been tested with DSV and works for secrets that are base64 encoded and secrets that are not base64 encoded (which is the default).
Signed-off-by: Steffen Scheib <sscheib@redhat.com>
* Set subscription type as developer for developer subscriptions.
Signed-off-by: Tong He <the@redhat.com>
* Set subscription type as developer for developer subscription manifests.
Signed-off-by: Tong He <the@redhat.com>
* Remedy the wrong character to assign value.
Signed-off-by: Tong He <the@redhat.com>
* Reformat licensing.py by black.
Signed-off-by: Tong He <the@redhat.com>
---------
Signed-off-by: Tong He <the@redhat.com>
* Setting credential_type as required
* Added test for missing credential_type in credential module
* Corrected test assertion
---------
Co-authored-by: Lucas Benedito <lbenedit@redhat.com>
* add alt to images in workflow_templates.rst
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
* add alt to images in workflow_templates.rst
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
* Update workflow_templates.rst
* Revised proposed alt text for workflow_templates.rst
---------
Signed-off-by: Ratan Gulati <ratangulati.dev@gmail.com>
Co-authored-by: TVo <thavo@redhat.com>
This fixes a bug where jobs within a workflow job were not canceled
when the workflow job was canceled by the user
The fix is to submit the cancel request as a part of the
transaction that WorkflowManager commits its work in
this requires that we send the message without expecting a reply
so this changes the control-with-reply cancel to just a control function
* convert to valid type for serializer
* check that extra_vars are in request
* remove doubled line
* add integration test for change
* move change to the ad_hoc_command module
Signed-off-by: jessicamack <jmack@redhat.com>
* fix imports
Signed-off-by: jessicamack <jmack@redhat.com>
---------
Signed-off-by: jessicamack <jmack@redhat.com>
* Fix Boolean values defaulting to False in collection
* Remove null values in other cases, fix null handling for WFJT nodes
* Only remove null values if it is a boolean field
* Reset changes to WFJT node field processing
* Use test content from sean-m-sullivan to fix lookups in assert
Web container does not need to wait for migration
if the database is running and responsive, but migrations have not finished, it will start serving, and users will get the upgrading page
wait-for-migration prevent nginix and uwsgi from starting up to serve the "upgrade in progress" status page
There are a number of changes here:
- Abstract out a GHA composite action for running the dev environment
- Update the e2e tests to use that new abstracted action
- Introduce a new (matrixed) job for running collection integration
tests. This splits the jobs up based on filename.
- Collect coverage info and generate an html report that people can
download easily to see collection coverage info.
- Do some hacks to delete the intermediary coverage file artifacts
which aren't needed after the job finishes.
Signed-off-by: Rick Elrod <rick@elrod.me>
Might help to install receptor last,
that way when nodes are first connected to the mesh
they already have podman installed and can potentially
run jobs. Otherwise it might be possible for controller
to launch jobs against nodes that aren't fully set up.
Add a check_peers_changed() utility method
to determine if peers in attrs matches
the current instance peers.
Other changes:
- Set ip_address default to "", and do not
allow null.
Get rid of PeersSerializer and just use SlugRelatedField,
which should be more a straightforward approach.
Other changes:
- cleanup code related to the already-removed api/v2/peers
endpoint
- add "hybrid" node type into more instance_peers test cases
API changes
- cannot change peers or enable
peers_from_control_nodes on VM deployments
- allow setting ip_address
- use ip_address over hostname in the generated
group_vars/all.yml
- Drop api/v2/peers endpoint
DB changes
- add ip_address unique constraint, but ignore "" entries
Other changes
- provision_instance should take listener_port option
Tests
- test that new controls doesn't disturb other peers
relationships
- test ip_address over hostname
Dynamically flipping from Established
to Disconnected is not the intended
usage of InstanceLink State.
- Link state starts in Adding and becomes
Established once any control node first sees the link
is in the status KnownConnectionCosts
inspect_established_receptor_connections should
not change link state is current state is Removing.
Other changes:
- rename inspect_execution_nodes to inspect_execution_and_hop_nodes
- Default link state is Adding
- Set min listener_port value to 1024
- inspect_established_receptor_connections now
runs as part of cluster_node_heartbeat task
rename migration function set_peers_from_control_nodes_true to automatically_peer_from_control_plane
import settings and only run function if settings.IS_K8S is true
set listener_port for control nodes to None
Add hop node support to awx collections
- add peers and peers_from_control_nodes fields
- show new node_type "hop"
- add tests for adding hop nodes via collections
Co-authored-by: Seth Foster <fosterseth@users.noreply.github.com>
Add Disconnected link state
introspect_receptor_connections is a periodic
task that examines active receptor connections
and cross-checks it with the InstanceLink info.
Any links that should be active but are not
will be put into a Disconnected state. If
active, it will be in an Established state.
UI - Add hop creation and peers mgmt (#13922)
* add UI for mgmt peers, instance edit and add
* add peer info on detail and bug fix on detail
* remove unused chip and change peer label
* rename lookup, put Instance type disable on edit
---------
Co-authored-by: tanganellilore <lorenzo.tanagnelli@hotmail.it>
Add full path for the mv command so that the command can be run from ui_next and from project root.
Additionally move the rename of file to src build step.
Dispatcher refactoring to get pg_notify publish payload
as separate method
Refactor periodic module under dispatcher entirely
Use real numbers for schedule reference time
Run based on due_to_run method
Review comments about naming and code comments
We introduce a thin wrapper over Django's RedisCache so that the functionality of DJANGO_REDIS_IGNORE_EXCEPTIONS is retained while still being able to drop the django-redis dependency.
Credit to django-redis's implementation for the idea of using a decorator for this and abstracting out the exception handling logic.
Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes https://github.com/ansible/awx/issues/14245 which has
more information about this issue.
This change addresses both:
- A clashing signal handler (registering a callback to fire when
the task manager times out, and hitting that callback in cases
where we didn't expect to). Make dispatcher timeout use
SIGUSR1, not SIGTERM.
- Metrics not being reported should not make us crash, so that is
now fixed as well.
Signed-off-by: Rick Elrod <rick@elrod.me>
Co-authored-by: Alan Rominger <arominge@redhat.com>
On some systems, /bin/sh is a bash symlink and running it will launch
bash in sh compatibility mode. However, bash-specific syntax will still
work in this mode (for example using == or pipefail).
However, on systems where /bin/sh is a symlink to another shell (think:
Debian-based) they might not have those bashisms.
Set the shell in the Makefile, so that it uses bash (since it is already
depending on bash, even though it is calling it as /bin/sh by default),
and add a shebang to pre-commit.sh for the same reason.
Signed-off-by: Rick Elrod <rick@elrod.me>
Added persistent storage
Auto-create vault and awx via playbooks
Create a new pattern for custom containers where we can do initialization
Auto-install roles needed for plumbing via the Makefile
When we close/cancel a connection to a web node, give the task time to
clean up after itself and cleanly exit. Otherwise, the Python GC might
clean up the task too early and this leads to ugly log messages like
this: "Task was destroyed but it is pending!"
Signed-off-by: Rick Elrod <rick@elrod.me>
NUL characters are not allowed in text fields in the database
We used to strip them out of stdout but the exception changed
And we want to be sure to strip them out of JSONBlob fields
- Nix an unused function from run_dispatcher. This stopped being used
in 558e92806b but was never removed.
- Fix a typo in run_ws_heartbeat: hearbeat -> heartbeat that has existed
since the beginning of this daemon.
Signed-off-by: Rick Elrod <rick@elrod.me>
Right now we only enable queuing on the rsyslog main_queue. This adds a
parameter to also enable it on the omhttp output action. As omhttp can
take time to process messages (e.g. blocking on the result of its HTTP
requests), this change allows for queuing messages up and hopefully
preventing some messages from getting lost when the log server is slow
to respond.
Signed-off-by: Rick Elrod <rick@elrod.me>
This is expected to free up 4 additional database connections per traditional node
compare to roughly 12 in total before this change
Out of these 3 are accomplished by using existing connection for recently added services
then 1 is obtained by closing the connection for the idle callback receiver main process
Signed-off-by: jessicamack <jmack@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
This adds a handful of metrics to /api/v2/metrics/ recorded from the dispatcher main process
Adds logic in the dispatcher period tasks to calculate these for the last collection interval
Reports worker count, task count, scale up events, and availability
Add data to demo grafana dashboard
This fixes two different exceptions in wsrelay.
* One resulted from heartbeet getting ability in #13858 to gracefully
shut down. When we saw the message come through, we didn't fully
clean up the connection to the web node.
* The second resulted when Redis disappeared. We still want to exit in
that case, but it's better to log a message and exit gracefully
instead of crashing out.
Signed-off-by: Rick Elrod <rick@elrod.me>
raise Exception in the case that return code is non-zero
this approach has shown itself to be the most consistently reliable across multiple ecosystems
This was caused by an incorrect parent_key ref from label to job
also applies to workflow_job labels
This fixes a regression introduced by a recent merge (#13957)
Due to dependency issues specifically around upgrading to Django 4.2, we
cannot feasibly have a dependency on psycopg2 and psycopg3. The only
place that was currently using psycopg3 was wsrelay.
Change wsrelay to use the asyncpg library and psycopg2 instead.
Tested locally on kind with a dev build of awx.
Signed-off-by: Rick Elrod <rick@elrod.me>
This was making host sub-list views non-functional
specifically for constructed and smart inventory
views would always return 0 results before this fix
In a prior merge, we added the ability to slap filter_read_permission = False on a view to get a certain functionality where it didn't filter a sublist the view is showing.
This logic already existed in a highly duplicated form among a number of views, so this deletes those methods in favor of the flag.
* Fix organization not showing all galaxy credentials for org admin
* Add basic test to ensure counts
* refactored approach to allow removal of redundant code
* Allow configurable prefetch_related
* implicitly get related fields
* Removed extra queryset code
ad_hoc_command_cancel really can no longer timeout on a cancel (it happens sub second) and remove unneeded block
Modified all test to respect test_id parameter so that all tests can be run togeather as a single ID
Fix a check in group since its group2 is deleted from being a sub group of group1
The UI now allows to propage sub groups to the inventory which we may want to support within the collection
Only run instance integration test if we are running on k8s and assume we are not by default
Fix hard coded names in manual_project
* Use separate module for test settings
* Further refine some pre-existing comments in settings
* Add CACHES to setting snapshot exceptions to accommodate changed load order
- Change default PYTHON in Makefile to be ranked choice
- Fix `PYTHON_VERSION` target that expects just a word
- Use native GNU Make `$(subst ,,)` instead of `sed`
- Add 'version-for-buildyml' target to simplify ci
If I understand correctly, this change should make
'$(PYTHON)' work how we want it to everywhere. Before
this change, on develpers' machines that don't have
a 'python3.9' in their path, make would fail. With this
change, we will prefer python3.9 if it's available, but
we'll take python3 otherwise.
* Avoid recursive include of DEFAULT_SETTINGS, add sanity test to avoid similar surprises
* Implement review comments for more clear code order and readability
* Clarify comment about order of app name, which is last in order so that it can modify user settings
we link awx.egg-link from `tools/docker-compose/awx.egg-link` to `/tmp/awx.egg-link` than we move `/tmp/awx.egg-link` to `/var/lib/awx/venv/awx/lib/python3.9/site-packages/awx.egg-link`
bonus... now we dont have to set PYTHON=python3.9
Linking launch script and supervisor conf file in kube development environment so we no longer have to rebuild kube devel images for superviosr conf file and launch script changes
- use different dockerfile for awx_devel and awx image
- make all Dockerfile* targets PHONY (bc its cheap to run)
- fix HEADLESS not working for awx-kube-build
In web/task split deployment web and task container no longer share the same redis cache
In the original code we use redis cache to pass the list of sub objects that need to be copied to the new object
In this PR we extracted out the logic that computes the sub_object_list and move it into deep_copy_model_obj task
* Introduce new method in settings, import in-line w NOQA mark
* Further refine the app_name to use shorter service names like dispatcher
* Clean up listener logic, change some names
* Fixed#13402 allow user defined key retrieval from CYBR
* Add default value to object_property
* Raise ValueError if object_property not in response
* Raise KeyError instead of ValueError
When the API request is for /inventories/id use that as the URL in the
API response. When the request is for /constructed_inventories/id use
that.
Signed-off-by: Rick Elrod <rick@elrod.me>
these make targets are for starting the different daemons within the kube/docker development environment updating the name to make it better reflect their intention
also added comments above the make target to describe what they do
note: these comments show up when run `make help`
previously this is used so that task running in the task container can reach into the web container to restart rsyslog
now that the web container and task container are split there's no longer a way to do that so i renamed this env var to reference where it will now do
which is pointing to the supervisor conf file of the current running container
launch_awx.sh that this PR rename is also now only use for launching awx web container renaming to reflect it's purpose
also remove the no longer needed creation of rsyslog conf as rsyslog is no longer in the web container
Update Dockerfile.j2
supervisor.conf.j2 file is the template for supervisor.conf file for the web container rename to supervisor_web.conf make it more clear that it is use for the web container
get_local_queuename will return the pod name of the instance
now that web and task are in different pods when web container queue a task it will be put into a queue without as task worker to execute the task
Works by adding a dedicated producer in wsrelay that looks for
local django channels message with group "metrics". The producer
sends this to the consumer running in the web container.
The consumer running in the web container handles the message by
pushing it into the local redis instance.
The django view that handles a request at the /api/v2/metrics
endpoint will load this data from redis, format it, and return the
response.
We internally manipulate the message payload a bit (to know whether we
are originating it on the task side or the web system is originating
it). But when we get the message, we actually get a reference to the
dict containing the payload.
Other producers in wsrelay might still be acting on the message and
deciding whether or not to relay it. So we need to manipulate and send a
*copy* of the message, and leave the original alone.
Signed-off-by: Rick Elrod <rick@elrod.me>
We no longer need to do this from wsrelay, as it will automatically try
to reconnect when it hears the next beacon from heartbeet.
This also cleans up the logic for what we do when we want to delete a
node we previously knew about.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Created new rsyslog launch file.
* Rsyslog conf work.
* Refining how we're calling rsyslog conf.
* Removed rsyslog so it no longer launches in the web container.
* Added the new launch_awx_rsyslog.sh to the /usr/bin
* add management command and logging for new daemon
* switch tasks over to calling pg_notify
* add daemon to docker-compose and supervisor
* renamed handle_setting_changes and moved notify call
* removed initial rsyslog configure from dispatcher
* add logging and clear cache before reconfigure
* add notify to delete
* moved pg_notify to own function
* update tests impacted by rsyslog change
* changed over to new pg_notify method
Signed-off-by: Jessica Mack <jmack@redhat.com>
* Save facts on model for original host
Redirect to original host for ansible facts
Use current inventory hosts for facts instance_id filter
Thanks for Gabe for identifying this bug
* Fix spelling of queryset
Co-authored-by: Rick Elrod <rick@elrod.me>
* Fix sign error with facts expiry - from review
---------
Co-authored-by: Rick Elrod <rick@elrod.me>
* Backlink events to real hosts and summaries to both hosts
* Prevent error when original host is deleted during job run
* No duplicate entries, review suggestion from Rick
* Change word tense in help text, dict style adjustments
From code review
Co-authored-by: Rick Elrod <rick@elrod.me>
* Back out new variable for constructed host id
---------
Co-authored-by: Rick Elrod <rick@elrod.me>
- When updating, we need the original object so we can make sure we
aren't changing things we shouldn't be.
- We want to allow source_vars and limit, but not much else.
- We want to block everything else (at least, if it doesn't match what
is in the original object...to allow the collection to work properly).
- Add two functional tests.
Signed-off-by: Rick Elrod <rick@elrod.me>
Including changes to our custom Ordered m2m field which previously broke
if the source and target model was the same.
Signed-off-by: Rick Elrod <rick@elrod.me>
Co-authored-by: Alan Rominger <arominge@redhat.com>
- add kind 'constructed' to inventory module
- add 'input_inventories' field to inventory module
Co-authored-by: Rick Elrod <rick@elrod.me>
Signed-off-by: Rick Elrod <rick@elrod.me>
* Add constructed inventory docs and do minor field updates
Add verbosity field to the constructed views
automatically set update_on_launch for the auto-created constructed inventory source
Make the GET function work at most basic level
Basic functionality of updating working
Add functional test for the GET and PATCH views
Add constructed inventory list view for direct creation
Add limit field to constructed inventory serializer
move limit field from InventorySourceSerializer to InventorySourceOptionsSerializer (#13464)
InventorySourceOptionsSerializer is the parent for both InventorySourceSerializer and InventoryUpdateSerializer
The limit option need to be exposed to both inventory_source and inventory_update
Co-Authored-By: Hao Liu <44379968+TheRealHaoLiu@users.noreply.github.com>
this allow you to pre-build your ui_next outside of container and it won't try to rebuild when you build awx image
`make ui-next` will no longer rebuild if awx/ui_next/build exist
- move placeholder index_awx.html out of ui_next build dir
- copy index_awx.html to build dir during development bootstrap if UI_NEXT has not been build
* Fix race with heartbeat and reaper logic
* Fix tests to fail when over drift over heartbeat time
* replaced modified with started time for reap() code and added test
* fixed logic bug and cleaned up tests
* Added comments to tests to call out reasoning
- Add new makefile for building ui_next
- Add setting to toggle ui_next
- Add URL path for displaying ui_next
- Update collectstatic and template dir config to serve ui_next
The class that contained these tests wasn't named Test*, so the tests in
it weren't running. Fix that and fix the tests in it so that they pass.
Signed-off-by: Rick Elrod <rick@elrod.me>
* Adds support for a pseudolocalization query param to check to see whether a string has been marked for translation
Adds support for a pseudolocalization query param to check to see whether a string has been marked for translation
* Adds support for passing a lang param to force rendering in a particular language
* Remove unused import
* adding roles to instance groups
added ResourceMixin to Instancegroup and changed the filtered_queryset
* added necessary changes to rebuild relationship between IG and roles
* added description to InstanceGroupAccess
* preliminary ui plug for demo purposes
* preliminary ui plug for demo purposes
added inventory special logic for use_role to allow attaching instance groups
added more tests to handle those cases
* Add access_list to InstanceGroup
* scratch branch to test migration work
* refactored to shorten logic
* Added migration and am removing logic that enabled Org admin permissions
* Add Obj admin role to JT, Inv, Org
* Changed tests to reflect new permissions
* refactored some of the tests
* cleaned up more tests and reworded help on InstanceGroupAccess
* Removed unnecessary delete of Route for instance group perms change
* Fix UI tests and migration
* fixed permissions on prompt for InstanceGroups
* added related object roles endpoint
* added ui/api function for options instance_groups
* separate the migrations in order to avoid issues with migrations not being finished
* changed migrations parent class to disable the activity stream error in migrations
* Added logging to migration as activitystream is disabled
* added clarifying comment to jobtemlateaccess and linted UI addition
* renamed migrations to avoid collisions
* Rename migrations to avoid collisions
MOVE the config template v1 to v2
delete other v1 views since v1 is deleted
the host fact gather collection over time was removed
also the job start view was removed
Insights integration was changed and the host insights
view no longer exists
Slightly modernize config help
- periodically ping postres on port 5432 and only start
migrations if successful.
- prevents crash loop when attempting migrations before
postgres is ready.
linting
linting again
Use the correct role on org permission check
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/main/access.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/main/access.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update docs/bulk_api.md
Co-authored-by: Alan Rominger <arominge@redhat.com>
fix collection test (#19)
improve readability of through model object creation (#18)
lower num jobs/hosts in tests (#20)
we can test query scaling at lower numbers, to reduce
load in tests. We suspect this was causing some flake
in the tests on PRs
adjust the num of queries
* Bulk launch serializer RBAC and code structure review
Use WJ node as base in bulk job launch child
remove fields we get for free this way
Minor translation marking
Consolidate bulk API permission methods
split out permission check for each UJT type
Code consolidation for org check method
add a save before starting the workflow job
Make the max host default 100. We are seeing with moderate number of hosts i.e. 500 hosts having a few host variable each runs into max size of nginx message and nginx rejects the request.
we are therefor keeping the value small so that it doesn't fail with decent number of host variables as well.
remove the 999 hosts test because the default max is 100
fix the credential check
fix the instance groups and execution env permission checks
* evaluate max bulk settings in validate...
instead of in class attribute. This makes them load at request time
instead of at app start up time, which fixes problems with test
as well as I think will be better user experience if admins
actually do change the setting it will apply without restarting
django app on each instance
* improve OPTIONS by not manually declaring feilds
alan pointed this out
* fix access problems and add Add bulk job max settings to api
filter workflow job nodes better
This will both improve performance by limiting the queryset for the node
sublists as well as fix our access problem.
override can_read instead of modify queryset in access.py
We do this because we are not going to expose bulk jobs to the list
views, which is complicatd and has poor performance implications.
Instead, we just care about individual Workflows that clients get linked
to not being broken.
fix comment
remove the get functions from the conf.py for bulk api max value
comment the api expose of the bulk job variables
reformt conf.py with make black
trailing space
add more assertion to the bulk host create test
check label permission and fix lint (#13)
* set created by and launch type correctly
This makes "launched_by" get computed right in the tests.
Mysteriously this seemed to work from API browser, but
this seems more correct to have it work this way, and makes
tests actually work.
For "manual" launch types the attribute used to populate "launched_by"
is "created_by". And we already have "is_bulk_job" to indicate that the
job is a bulk job. So lets just use this.
* check label is in an organization you can read
* add assertions around access to resulting job
there is a problem getting the job w/ the user that launched it
add more assertions to bulk tests (#11)
dig more into the results and assert on results
also, use a fixture that already implemented the "max queries" thing
fix ansible collection sanity tests (#12)
Various fixes
- Don't skip checking resource RBAC permissions for admins
Necessary to handle bad input, e.g. providing a
unified_job_template id that doesn't exit
- In awxkit, only "walk" if we get 'url' in the result
- Bulk host create should return url pointing to inventory,
not inventory/hosts
dont do org check for superuser
fix the api-lint
fix the api-lint
add the descrition to the bulk job launch module params
add the description for the description field
add the description for the description field
add docs for the bulk api
fix the models on the bulk api serializers
fix some of the issues highlighted in the code review
better use of role model
remove comments
better error message
revert the PrimaryKeyRelatedField for unified_job_template and inventory
Enabled the params bulk job
make black
make black again
Fixed inventory and organization input params for bulk modules
add collection integration tests
Fix cli return errors
fix test completeness
return more context for bulk host create
now return list of minimal info about host objects
[
{
"name": "lakjdsafoiaweirnladlk",
"enabled": true,
"instance_id": "",
"description": "",
"variables": "",
"id": 4593,
"url": "/api/v2/hosts/4593/",
"inventory": "/api/v2/inventories/1/"
}
]
Updated tests, but needed to work around some weird behavior with
sqlite. Apparently it behaves differently around assigning ID's to the
result of bulk_create and that is messed up my use of `reverse` to look
up the url of the hosts
make black changes
increase the number of queries to 30
fix the flake failure
add functional changes for bulk job launch and some minor fixes
pull changes
we needed to inherit from GenericAPIView to get the options to render
correctly
q!
add execution env support
add organization validation to the workflowjob
Update awx/api/serializers.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Update awx/api/serializers.py
Co-authored-by: Elijah DeLee <kdelee@redhat.com>
Provide a view that allows users to launch many jobs with one POST
request. Under the hood, this creates a workflow with a number of jobs
all in a "flat" structure -- much like a sliced job, but with arbitrary
"joblets".
For ~ 100 nodes looking at ~ 200 some queries, which is more than the
proof of concept, but still an order of magnitude better than individual
job launches.
Still more work to implement other m2m objects, and need to address what
Organization should be assigned to a WorkflowJob launched by a BulkJob.
They need this so they can step into the workflow_job_nodes and get the
status of all the containing jobs.
Also want to test when there are MANY job templates etc in the system
because the querires like
UnifiedJobTemplate.accessible_pk_qs(request.user, 'execute_role').all()
queries scare me, seems like it could be a lot of things.
use "many=True" instead of ListField
Still seeing identical number of queries when creatin 100 jobs, going to
investigate more
only validate type in nested serializer
then, we actually get the database object after we do the RBAC checks
This drops us down from hundreds of queries to launch 100 jobs,
to less than 100 queries to launch 100 jobs (I got around 24 queries to
launch 100 jobs with credentials)
pave way for more promptable things
add "limit" as possible prompt on launch to bulk jobs
re-organize how we add credentials to pave way for the other m2m items
not having to repeat too much code
add labels to the bulk job
add the other fields to the workflowjobnode
move urls around
allow system admins, org admins, and inventory admins to bulk create
hosts.
Testing on an "open" licensed awx as system admin, I created 1000 hosts with 6 queries in ~ 0.15 seconds
Testing on an "open" licensed awx as organization admin, I created 1000 hosts with 11 queries in ~ 0.15 seconds
fix org max host check
also only do permission denied if license is a trial
add /api/v2/bulk to list bulk apis available
add api description templates
One motiviation to not take a list of hosts with mixed inventories is to
keep things simple re: RBAC and keeping a constant number of queries.
If there is great clamor for accepting list of hosts to insert into
arbitrary different inventories, we could probably make it happen - we'd
need to pop the inventory off of each of the hosts, run the
HostSerializer validate, then in top level BulkHostCreateSerializer
fetch all the inventories/check permissions/org host limits for those
inventories/etc. But that makes this that much more complicated.
add test for rbac access
test also helped me find a bug in a query, fixed that
add test to assert num queries scales as expected
also move other test to dedicated file
also test with super user like I meant to
record activity stream for the inventory
this records that a certain number of hosts were added by a certain user
we could consider if there is any other additional information we want
to include
* Azure AD users should not be able to change their password
* Multiple auth changes
Moving get_external_user function into awx.sso.common
Altering get_external_user to not look at current config, just user object values
Altering how api/conf.py detects external auth config (and making reusable function in awx.sso.common)
Altering logic in api.serializers in _update_pasword to use awx.sso.common
* Adding unit tests
---------
Co-authored-by: John Westcott IV <john.westcott.iv@redhat.com>
awx-kube-build and docker-compose-build share the same Dockerfile
if u run awx-kube-build than docker-compose-build in succession the second command wont run the Dockerfile target and cause the image to be built with the incorrect Dockerfile
from @relrod
`head` will close the input fd when it no longer needs it (or exits). find will try to write to the closed fd and somewhere along the way, it will receive SIGPIPE as a result. This is why `yes | head -5 ` doesn't run forever.
docker-compose v1 is EOL since April 2022 and hasn't received any
updates since May 2021. docker compose v2 is a complete rewrite in
Go which acts as a plugin for the main docker application.
The syntax is the same, but only the `compose` command differs.
This commit adds the ability to override the default `docker-compose`
command using `make DOCKER_COMPOSE='docker compose'`.
Signed-off-by: Tom Siewert <tom@siewert.io>
Providing defaults for API parameters where the API already provides
defaults leads to some confusing scenarios, because we end up always
sending those collection-defaulted fields in the request even if the
field isn't provided by the user.
For example, we previously set the `scm_type` default to 'manual' and
someone using the collection to update a project who does not explicitly
include the `scm_type` every time they call the module, would
inadvertently change the `scm_type` of the project back to 'manual'
which is surprising behavior.
This change removes the collection defaults for API parameters, unless
they differed from the API default. We let the API handle the defaults
or otherwise ignore fields not given by the user so that the user does
not end up changing unexpected fields when they use a module.
Signed-off-by: Rick Elrod <rick@elrod.me>
According to latest documentation, role and value are now "one or more" fields. So they both need to be arrays. Entering the json data as you have in this article doesn't work. But when I added the brackets, it then worked.
Thank you
* Moving reconcile_users_org_team_mappings into common library
* Renaming pipeline to social_pipeline
* Breaking out SAML and generic Social Auth
* Optimizing SMAL login process
* Moving extraction of org in teams from backends into sso/common.create_orgs_and_teams
* Altering saml_pipeline from testing
Prefixing all internal functions with _
Modified subfunctions to not return values but instead manipulate multable objects
Modified all functions to not add duplicate orgs to the orgs_to_create list
* Updating the common function to respect a teams organization name
* Added can_create flag to create_org_and_teams
This made testing easier and allows for any adapter with a flag the ability to simply pass it into a function
* Multiple changes to SAML pipeline
Removed orgs_to_create from being passed into user_team functions, common create orgs code will add any team orgs to list of orgs automatically
Passed SAML_AUTO_CREATE_OBJECTS flag into create_org_and_teams
Fix bug where we were looking at values instead of keys
Added loading of all teams if remove flag is set in update_user_teams_by_saml_attr
* Moving common items between SAML and Social into a 'base'
* Updating and adding testing
* Renamed get_or_create_with_default_galaxy_cred to get_or_create_org_...
_update_m2m_from_groups must return None if remove_* is false or empty,
because None indicates that the user permissions will not be changed.
related #13429
Perform a git reset --hard before attempting to release awxkit to pypi.
We found that something new in the process was causing an unexpected behavior if the git tree had any changes inside it.
It would cause a devel version to be created and used as part of the upload which pypi was refusing.
Collections can not easly be deleted from galaxy so if we have to rerun a job because of a pypi or quay failure we don't want to try and upload the collection again.
* first deprecation pass, need to confirm date or version
* remove doc block updates as not needed, update runtime and remove symlinks
* add line to readme as notable release
* update version before release
* Workaround for events with NUL char, touch up error loop
This fixes an error where some events would not save
due to having the 0x00 character which errors in postgres
this adds a line to replace it with empty text
Hitting that kind of event put us in an infinite error loop
so this change makes a number of changes to prevent similar loops
the showcase example is a negative counter,
this is not realistic in the real world but works for unit tests
These error loop fixes seek to esablish the cases where we clear the buffer
Some logic is removed from the outer loop, with the idea that
ensure_connection will better distinguish flake
* From review comments, delay NUL char sanitization to later
Use pop to make list operations more clear
* Fix incorrect use of pop
Description
Thycotic has various types of Secret Templates like Password, SSH Key
Thycotic API returns str type for Password and of Type for class
requests.models.Response for SSH Key. Current implementation only
considers Password template. However when trying for SSH Key code
need return the str from response type requests.models.Response
Signed-off-by: Tarun CHawdhury <tarunchawdhury@gmail.com>
HC Vault clusters use eventual consistency and might return an HTTP 412
if the secret ID hasn't replicated yet to the replicas / standby nodes.
If this happens the request should be retried.
related #13413
Signed-off-by: Kristof Wevers <kristof.wevers@infura.eu>
With the latest version of rsyslog we had a test failing with:
AssertionError: Response data: {'error': "b'rsyslog internal message (3,-2455): could not transfer the specified internal posix capabilities settings to the kernel, capng_apply=-5\\n [v8.2102.0-107.el9 try https://www.rsyslog.com/e/2455 ]\\n'"}
Downgrading fixes it
If a job fails, we do receptor work results and put that output
into result_traceback.
We should only do this if
1. Receptor unit has failed
2. Runner callback processed 0 events
Otherwise we risk putting too much data into this field.
The hiredis 2.1.0 release doesn't provide source distribution on PyPi so
users can't build that python package from sources.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
Use a Makefile arg for the ansible-test sanity CLI args
defaults to --docker
in the future we probably need to customize python versions
Copy the rule exception for Ansible 2.15
this helps people who are running from Ansible devel
Also just ignore one sanity test for the export module, instead of
ignoring all of them.
Also use latest ansible-test, and make it work on GHA (by using podman
instead of docker).
Signed-off-by: Rick Elrod <rick@elrod.me>
* Run collection sanity tests in CI
This requires adding a Makefile install of ansible-core
Fake the version to make semver check happy
* Fixes from ansible-test sanity failures
* Exclude the export module due to awxkit requirement
* Fix broken ansible-test rule exceptions
remove Ansible 2.14 exclusions that make ansible-test ERROR, saying they are not needed
* fix pytz
* fix NameError
* fix tests and add sanity ignore files for import test until distutils replaced
* change static method to regular method and update test to instantiate class
- `settings/minikube.py` gets imported conditionally, when the
environment variable `AWX_KUBE_DEVEL` is set. In this imported file,
we set `BROADCAST_WEBSOCKET_PORT = 8013`, but 8013 is only used in the
docker-compose dev environment. In Kubernetes environments, 8052 is
used for everything. This is hardcoded awx-operator's ConfigMap.
- Also rename `minikube.py` because it is used for every kind of
development Kube environment, including Kind.
Signed-off-by: Rick Elrod <rick@elrod.me>
This fixes several things related to our wsbroadcast stats handling.
This was found during the ongoing wsrelay work.
There are really three fixes here:
- Logging was not actually enabled for the analytics.broadcast_websocket
module, so that has been added to our loggers config.
- analytics.broadcast_websocket was not actually able to connect to
Redis due to 68614b83c0 as part of
the work in #13187. But there was no easy way to know this because the
logging issue meant no exceptions showed up anywhere reasonable.
- Relatedly, and also as part of #13187, we jumped from
`prometheus-client` 0.7.1 up to 0.15.0. This included a breaking
change where a `Counter` ending with `_total` will clash with a
`Gauge` of the same name but without `_total`. I am not 100% sure of
the reasoning here, other than "OpenMetrics compatibility".
Refs #13301
Refs #13187
Signed-off-by: Rick Elrod <rick@elrod.me>
In #13200 the dev env was changed to make `verifysignature` optional,
dependent on a variable set before ansible gets run to set up the
`docker-compose` environment.
However along with that change, a change to the execution node install
bundle slipped in, which is seemingly unrelated to the dev env change
and is breaking some installs: #13234, ansible/awx-operator#1132.
I think this change was unintentional as it would at least require
another change in ansible/receptor-collection and maybe a change in
ansible/awx-operator as well.
Signed-off-by: Rick Elrod <rick@elrod.me>
This will allow users of the operator to set these settings
so from the start when the operator creates the default
execution queue they can control the max_forks and max_concurrent_jobs
on the default container group.
The intention of this feature is primarily to provide some notion of max
capacity of container groups, but the logic I've left generic. Default
is 0, which will be interpereted as no maximum number of jobs or forks.
Includes refactor of variable and method names for clarity.
instances_by_hostname is an internal attribute of TaskManagerInstances.
Clarify when we are expecting the actual TaskManagerInstances object.
Unify how we process running tasks and consume capacity. This has the
effect that we do less expensive work in after_lock_init and have 1 less
loop over all the running tasks. Previously we looped for both building
the dependency graph as well as for calculating the starting capacity of
all the instances and instance groups. Now we acheive both tasks in the
same loop.
Because of how this changes the somewhat subtle "do-si-do" of how to
initialize the Task Manager models, introduce a wrapper class that tries
to take some of that burden off of other areas where we re-use this like
in the serializer and the metrics. Also use this wrapper class to handle
nicities of how to track capacity consumption on instances and instance
groups.
Add tests for max_forks and max_concurrent_jobs
Fixup tests that use TaskManagerModels to accomodate changes.
assign ig before call to consume capacity
if we don't do it in that order, then we don't correctly account for
the container group jobs we are starting in the middle of the task
manager run
Since the original version of the migration a) invoked the .save()
method, and b) involved a model with a custom field that had a
post_save handler attached, this migration had a side-effect that
caused the codebase's version of the model to be used when the table
involved wasn't yet up to date. This triggers an UndefinedColumn error.
This change works around the problem by making use of queryset
.update() methods instead, which should avoid the post_save signal
trigger.
related to https://github.com/ansible/ansible/pull/78175
the way the GHA runner is built, Python runs with a mixed locale between the FS bits and the default encoding, which can cause unpredictable issues
adding env var `LC_ALL: "C.UTF-8"` prevent flakiness due to locale issue
Signed-off-by: Hao Liu <haoli@redhat.com>
Removing all >= dependencies as these were upgraded past the >= version with the last update.
The following libraries were secondary imports and were removed from the requirements.in as we are past the version required to fix their CVEs:
* autobhan
* kubernetes
* pyjwt
* sqlparse
aioredist was superceeded by redis
Someone referenced this directly but didn't add it to requirements.in. So when we upgraded channels-redis and it dropped aioredis this started failing
Certs are generated on the host and there is currently an issue due to openssl version mispatch between Fedora 36 and CentOS Stream 8 which causes:
tools_awx_1 | ERROR 2022/11/15 17:09:17 could not load signing key file: unknown block type PRIVATE KEY
tools_awx_1 | ERROR 2022/11/15 17:09:17 could not load signing key file: unknown block type PRIVATE KEY
* Facts scaling fixes for large inventory, timing issue
Move save of Ansible facts to before the job status changes
this is considered an acceptable delay with the other
performance fixes here
Remove completely unrelated unused facts method
Scale related changes to facts saving:
Use .iterator() on queryset when looping
Change save to bulk_update
Apply bulk_update in batches of 100, to reduce memory
Only save a single file modtime, avoiding large dict
Use decorator for long func time logging
update decorator to fill in format statement
* Fixes#13119#13120 Cloud support & update brand
* rm base64 import to pass lint
* Update references across the board
* Removed final reference to CyberArk Conjur Secret Lookup
Once since it is defined as a CustomCommand subclass, and once because
it is an endpoint at the /api/v2/ level. With Python 3.11 argparse
has become more strict and will raise an exception when you try to
inject duplicate subparsers.
- enable schema upload to s3 bucket for feature branch
- add workflow to delete schema from s3 bucket when feature branch is deleted
Signed-off-by: Hao Liu <haoli@redhat.com>
Previously, in some cases, an InventoryUpdate sourced by an SCM project
would still run and be successful even after the project it is sourced
from failed to update. This would happen because the InventoryUpdate
would revert the project back to its last working revision. This
behavior is confusing and inconsistent with how we handle jobs (which
just refuse to launch when the project is failed).
This change pulls out the logic that the job launch serializer and
RunJob#pre_run_hook had implemented (independently) to check if the
project is in a failed state, and puts it into a method on the Project
model. This is then checked in the project launch serializer as well as
the inventory update serializer, along with
SourceControlMixin#sync_and_copy as a fallback for things that don't run
the serializer validation (such as scheduled jobs and WFJT jobs).
Signed-off-by: Rick Elrod <rick@elrod.me>
This takes some logic out of the queryset logic,
using some established assumptions about the task manager
if a job lands on a hybrid node (or is a project update) then
it will have the same controller and execution node
With that established, the queryset can be simplified
Really these could get any of the unified job template types, not just
system job templates, so importing e.g. a project with a schedule was
doing them in the wrong order.
Also, bump the timeout of the project update and make sure that we
stash it in the page cache even if it doesn't finish in 5 minutes.
when running `make ui-devel`. Previously they were going to
/awx_devel/awx/public/static, but that directory is no longer being
served up by nginx, which forced us to have to run `make
collectstatic` (or equivalent) to get the files to the right place.
- Hello, this appears to be less of a bug report or feature request and more of a question. Could you please ask this on our mailing list? See https://github.com/ansible/awx/#get-involved for information for ways to connect with us.
### Visit the Forum or Matrix
- Hello, this appears to be less of a bug report or feature request and more of a question. Could you please ask this on either the [Ansible AWX channel on Matrix](https://matrix.to/#/#awx:ansible.com) or the [Ansible Community Forum](https://forum.ansible.com/tag/awx)?
Thank you once again for this and your interest in AWX!
### Red Hat Support Team
- Hi! \
\
It appears that you are using an RPM build for RHEL. Please reach out to the Red Hat support team and submit a ticket. \
\
Here is the link to do so: \
\
https://access.redhat.com/support \
\
Thank you for your submission and for supporting AWX!
## Common
@@ -96,6 +106,13 @@ The Ansible Community is looking at building an EE that corresponds to all of th
### Oracle AWX
We'd be happy to help if you can reproduce this with AWX since we do not have Oracle's Linux Automation Manager. If you need help with this specific version of Oracles Linux Automation Manager you will need to contact your Oracle for support.
### Community Resolved
Hi,
We are happy to see that it appears a fix has been provided for your issue, so we will go ahead and close this ticket. Please feel free to reopen if any other problems arise.
<name of community member who helped> thanks so much for taking the time to write a thoughtful and helpful response to this issue!
### AWX Release
Subject: Announcing AWX Xa.Ya.za and AWX-Operator Xb.Yb.zb
Early versions of AWX did not support seamless upgrades between major versions and required the use of a backup and restore tool to perform upgrades.
Users who wish to upgrade modern AWX installations should follow the instructions at:
As of version 18.0, `awx-operator` is the preferred install/upgrade method. Users who wish to upgrade modern AWX installations should follow the instructions at:
@@ -31,7 +31,7 @@ If your issue isn't considered high priority, then please be patient as it may t
`state:needs_info` The issue needs more information. This could be more debug output, more specifics out the system such as version information. Any detail that is currently preventing this issue from moving forward. This should be considered a blocked state.
`state:needs_review` The issue/pull request needs to be reviewed by other maintainers and contributors. This is usually used when there is a question out to another maintainer or when a person is less familar with an area of the code base the issue is for.
`state:needs_review` The issue/pull request needs to be reviewed by other maintainers and contributors. This is usually used when there is a question out to another maintainer or when a person is less familiar with an area of the code base the issue is for.
`state:needs_revision` More commonly used on pull requests, this state represents that there are changes that are being waited on.
@@ -80,7 +80,7 @@ If any of those items are missing your pull request will still get the `needs_tr
Currently you can expect awxbot to add common labels such as `state:needs_triage`, `type:bug`, `component:docs`, etc...
These labels are determined by the template data. Please use the template and fill it out as accurately as possible.
The `state:needs_triage` label will will remain on your pull request until a person has looked at it.
The `state:needs_triage` label will remain on your pull request until a person has looked at it.
You can also expect the bot to CC maintainers of specific areas of the code, this will notify them that there is a pull request by placing a comment on the pull request.
The comment will look something like `CC @matburt @wwitzel3 ...`.
@@ -7,7 +7,7 @@ AWX provides a web-based user interface, REST API, and task engine built on top
To install AWX, please view the [Install guide](./INSTALL.md).
To learn more about using AWX, and Tower, view the [Tower docs site](http://docs.ansible.com/ansible-tower/index.html).
To learn more about using AWX, view the [AWX docs site](https://ansible.readthedocs.io/projects/awx/en/latest/).
The AWX Project Frequently Asked Questions can be found [here](https://www.ansible.com/awx-project-faq).
@@ -30,12 +30,12 @@ If you're experiencing a problem that you feel is a bug in AWX or have ideas for
Code of Conduct
---------------
We ask all of our community members and contributors to adhere to the [Ansible code of conduct](http://docs.ansible.com/ansible/latest/community/code_of_conduct.html). If you have questions or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
We ask all of our community members and contributors to adhere to the [Ansible code of conduct](http://docs.ansible.com/ansible/latest/community/code_of_conduct.html). If you have questions or need assistance, please reach out to our community team at [codeofconduct@ansible.com](mailto:codeofconduct@ansible.com)
Get Involved
------------
We welcome your feedback and ideas. Here's how to reach us with feedback and questions:
- Join the `#ansible-awx` channel on irc.libera.chat
- Join the [mailing list](https://groups.google.com/forum/#!forum/awx-project)
- Join the [Ansible AWX channel on Matrix](https://matrix.to/#/#awx:ansible.com)
- Join the [Ansible Community Forum](https://forum.ansible.com)
This endpoint allows the client to create multiple hosts and associate them with an inventory. They may do this by providing the inventory ID and a list of json that would normally be provided to create hosts.
This endpoint allows the client to launch multiple UnifiedJobTemplates at a time, along side any launch time parameters that they would normally set at launch time.
*`inventory_update`: ID of the inventory update job that was started.
(integer, read-only)
*`project_update`: ID of the project update job that was started if this inventory source is an SCM source.
(interger, read-only, optional)
(integer, read-only, optional)
Note: All manual inventory sources (source="") will be ignored by the update_inventory_sources endpoint. This endpoint will not update inventory sources for Smart Inventories.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.