Edit AWX docs

This commit is contained in:
beeankha
2019-09-11 16:27:28 -04:00
parent 14cc203945
commit e2be392f31
16 changed files with 283 additions and 526 deletions

View File

@@ -1,19 +1,19 @@
## Ansible Runner Integration Overview
Much of the code in AWX around ansible and ansible-playbook invocation interacting has been removed and put into the project ansible-runner. AWX now calls out to ansible-runner to invoke ansible and ansible-playbook.
Much of the code in AWX around Ansible and `ansible-playbook` invocation has been removed and put into the project `ansible-runner`. AWX now calls out to `ansible-runner` to invoke Ansible and `ansible-playbook`.
### Lifecycle
In AWX, a task of a certain job type is kicked off (i.e. RunJob, RunProjectUpdate, RunInventoryUpdate, etc) in tasks.py. A temp directory is build to house ansible-runner parameters (i.e. envvars, cmdline, extravars, etc.). The temp directory is filled with the various concepts in AWX (i.e. ssh keys, extra varsk, etc.). The code then builds a set of parameters to be passed to the ansible-runner python module interface, `ansible-runner.interface.run()`. This is where AWX passes control to ansible-runner. Feedback is gathered by AWX via callbacks and handlers passed in.
In AWX, a task of a certain job type is kicked off (_i.e._, RunJob, RunProjectUpdate, RunInventoryUpdate, etc.) in `tasks.py`. A temp directory is built to house `ansible-runner` parameters (_i.e._, `envvars`, `cmdline`, `extravars`, etc.). The `temp` directory is filled with the various concepts in AWX (_i.e._, `ssh` keys, `extra vars`, etc.). The code then builds a set of parameters to be passed to the `ansible-runner` Python module interface, `ansible-runner.interface.run()`. This is where AWX passes control to `ansible-runner`. Feedback is gathered by AWX via callbacks and handlers passed in.
The callbacks and handlers are:
* event_handler: Called each time a new event is created in ansible runner. AWX will disptach the event to rabbitmq to be processed on the other end by the callback receiver.
* cancel_callback: Called periodically by ansible runner. This is so that AWX can inform ansible runner if the job should be canceled or not.
* finished_callback: Called once by ansible-runner to denote that the process that was asked to run is finished. AWX will construct the special control event, `EOF`, with an associated total number of events that it observed.
* status_handler: Called by ansible-runner as the process transitions state internally. AWX uses the `starting` status to know that ansible-runner has made all of its decisions around the process that it will launch. AWX gathers and associates these decisions with the Job for historical observation.
* `event_handler`: Called each time a new event is created in `ansible-runner`. AWX will dispatch the event to `rabbitmq` to be processed on the other end by the callback receiver.
* `cancel_callback`: Called periodically by `ansible-runner`; this is so that AWX can inform `ansible-runner` if the job should be canceled or not.
* `finished_callback`: Called once by `ansible-runner` to denote that the process that was asked to run is finished. AWX will construct the special control event, `EOF`, with the associated total number of events that it observed.
* `status_handler`: Called by `ansible-runner` as the process transitions state internally. AWX uses the `starting` status to know that `ansible-runner` has made all of its decisions around the process that it will launch. AWX gathers and associates these decisions with the Job for historical observation.
### Debugging
If you want to debug ansible-runner then set `AWX_CLEANUP_PATHS=False`, run a job, observe the job's `AWX_PRIVATE_DATA_DIR` property, and go the node where the job was executed and inspect that directory.
If you want to debug `ansible-runner`, then set `AWX_CLEANUP_PATHS=False`, run a job, observe the job's `AWX_PRIVATE_DATA_DIR` property, and go the node where the job was executed and inspect that directory.
If you want to debug the process that ansible-runner invoked (i.e. ansible or ansible-playbook) then observe the job's job_env, job_cwd, and job_args parameters.
If you want to debug the process that `ansible-runner` invoked (_i.e._, Ansible or `ansible-playbook`), then observe the Job's `job_env`, `job_cwd`, and `job_args` parameters.

View File

@@ -7,18 +7,18 @@ When a user wants to log into Tower, she can explicitly choose some of the suppo
* Github Team OAuth2
* Microsoft Azure Active Directory (AD) OAuth2
On the other hand, the rest of authentication methods use the same types of login info as Tower(username and password), but authenticate using external auth systems rather than Tower's own database. If some of these methods are enabled, Tower will try authenticating using the enabled methods *before Tower's own authentication method*. In specific, it follows the order
On the other hand, the other authentication methods use the same types of login info as Tower (username and password), but authenticate using external auth systems rather than Tower's own database. If some of these methods are enabled, Tower will try authenticating using the enabled methods *before Tower's own authentication method*. The order of precedence is:
* LDAP
* RADIUS
* TACACS+
* SAML
Tower will try authenticating against each enabled authentication method *in the specified order*, meaning if the same username and password is valid in multiple enabled auth methods (For example, both LDAP and TACACS+), Tower will only use the first positive match (In the above example, log a user in via LDAP and skip TACACS+).
Tower will try authenticating against each enabled authentication method *in the specified order*, meaning if the same username and password is valid in multiple enabled auth methods (*e.g.*, both LDAP and TACACS+), Tower will only use the first positive match (in the above example, log a user in via LDAP and skip TACACS+).
## Notes:
* SAML users, RADIUS users and TACACS+ users are categorized as 'Enterprise' users. The following rules apply to Enterprise users:
SAML users, RADIUS users and TACACS+ users are categorized as 'Enterprise' users. The following rules apply to Enterprise users:
* Enterprise users can only be created via the first successful login attempt from remote authentication backend.
* Enterprise users cannot be created/authenticated if non-enterprise users with the same name has already been created in Tower.
* Tower passwords of Enterprise users should always be empty and cannot be set by any user if there are enterprise backends enabled.
* If enterprise backends are disabled, an Enterprise user can be converted to a normal Tower user by setting password field. But this operation is irreversible (The converted Tower user can no longer be treated as Enterprise user)
* If enterprise backends are disabled, an Enterprise user can be converted to a normal Tower user by setting password field. But this operation is irreversible (the converted Tower user can no longer be treated as Enterprise user).

View File

@@ -1,18 +1,21 @@
# LDAP
The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.
The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry-standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.
# Configure LDAP Authentication
Please see the Tower documentation as well as Ansible blog posts for basic LDAP configuration.
Please see the [Tower documentation](https://docs.ansible.com/ansible-tower/latest/html/administration/ldap_auth.html) as well as [Ansible blog post](https://www.ansible.com/blog/getting-started-ldap-authentication-in-ansible-tower) for basic LDAP configuration.
LDAP Authentication provides duplicate sets of configuration fields for authentication with up to six different LDAP servers.
The default set of configuration fields take the form `AUTH_LDAP_<field name>`. Configuration fields for additional ldap servers are numbered `AUTH_LDAP_<n>_<field name>`.
## Test environment setup
Please see README.md of this repository: https://github.com/jangsutsr/deploy_ldap.git.
The default set of configuration fields take the form `AUTH_LDAP_<field name>`. Configuration fields for additional LDAP servers are numbered `AUTH_LDAP_<n>_<field name>`.
# Basic setup for FreeIPA
## Test Environment Setup
Please see `README.md` of this repository: https://github.com/jangsutsr/deploy_ldap.git.
# Basic Setup for FreeIPA
LDAP Server URI (append if you have multiple LDAPs)
`ldaps://{{serverip1}}:636`

View File

@@ -1,16 +1,16 @@
## Introduction
Starting from Tower 3.3, OAuth 2 will be used as the new means of token-based authentication. Users
will be able to manage OAuth 2 tokens as well as applications, a server-side representation of API
clients used to generate tokens. With OAuth 2, a user can authenticate by passing a token as part of
clients used to generate tokens. With OAuth 2, a user can authenticate by passing a token as part of
the HTTP authentication header. The token can be scoped to have more restrictive permissions on top of
the base RBAC permissions of the user. Refer to [RFC 6749](https://tools.ietf.org/html/rfc6749) for
the base RBAC permissions of the user. Refer to [RFC 6749](https://tools.ietf.org/html/rfc6749) for
more details of OAuth 2 specification.
## Basic Usage
To get started using OAuth 2 tokens for accessing the browsable API using OAuth 2, we will walkthrough acquiring a token, and using it.
To get started using OAuth 2 tokens for accessing the browsable API using OAuth 2, this document will walk through the steps of acquiring a token and using it.
1. Make an application with authorization_grant_type set to 'password'. HTTP POST the following to the `/api/v2/applications/` endpoint (supplying your own organization-id):
1. Make an application with `authorization_grant_type` set to 'password'. HTTP POST the following to the `/api/v2/applications/` endpoint (supplying your own `organization-id`):
```
{
"name": "Admin Internal Application",
@@ -22,7 +22,7 @@ To get started using OAuth 2 tokens for accessing the browsable API using OAuth
"organization": <organization-id>
}
```
2. Make a token with a POST to the `/api/v2/tokens/` endpoint:
2. Make a token with a POST to the `/api/v2/tokens/` endpoint:
```
{
"description": "My Access Token",
@@ -32,13 +32,13 @@ To get started using OAuth 2 tokens for accessing the browsable API using OAuth
```
This will return a `<token-value>` that you can use to authenticate with for future requests (this will not be shown again)
3. Use token to access a resource. We will use curl to demonstrate this:
3. Use token to access a resource. We will use `curl` to demonstrate this:
```
curl -H "Authorization: Bearer <token-value>" -X GET https://<awx>/api/v2/users/
```
> The `-k` flag may be needed if you have not set up a CA yet and are using SSL.
This token can be revoked by making a DELETE on the detail page for that token. All you need is that token's id. For example:
This token can be revoked by making a DELETE on the detail page for that token. All you need is that token's id. For example:
```
curl -ku <user>:<password> -X DELETE https://<awx>/api/v2/tokens/<pk>/
```
@@ -48,15 +48,17 @@ Similarly, using a token:
curl -H "Authorization: Bearer <token-value>" -X DELETE https://<awx>/api/v2/tokens/<pk>/ -k
```
## More Information
#### Managing OAuth 2 applications and tokens
Applications and tokens can be managed as a top-level resource at `/api/<version>/applications` and
`/api/<version>/tokens`. These resources can also be accessed respective to the user at
#### Managing OAuth 2 Applications and Tokens
Applications and tokens can be managed as a top-level resource at `/api/<version>/applications` and
`/api/<version>/tokens`. These resources can also be accessed respective to the user at
`/api/<version>/users/N/<resource>`. Applications can be created by making a POST to either `api/<version>/applications`
or `/api/<version>/users/N/applications`.
Each OAuth 2 application represents a specific API client on the server side. For an API client to use the API via an application token,
Each OAuth 2 application represents a specific API client on the server side. For an API client to use the API via an application token,
it must first have an application and issue an access token.
Individual applications will be accessible via their primary keys:
@@ -111,22 +113,20 @@ generated during creation; Fields `user` and `authorization_grant_type`, on the
*immutable on update*, meaning they are required fields on creation, but will become read-only after
that.
On RBAC side:
- system admins will be able to see and manipulate all applications in the system;
**On RBAC side:**
- System admins will be able to see and manipulate all applications in the system;
- Organization admins will be able to see and manipulate all applications belonging to Organization
members;
- Other normal users will only be able to see, update and delete their own applications, but
cannot create any new applications.
Tokens, on the other hand, are resources used to actually authenticate incoming requests and mask the
permissions of the underlying user. Tokens can be created by POSTing to `/api/v2/tokens/`
endpoint by providing `application` and `scope` fields to point to related application and specify
token scope; or POSTing to `/api/v2/applications/<pk>/tokens/` by providing only `scope`, while
the parent application will be automatically linked.
Individual tokens will be accessible via their primary keys:
Individual tokens will be accessible via their primary keys at
`/api/<version>/tokens/<pk>/`. Here is a typical token:
```
{
@@ -162,18 +162,19 @@ Individual tokens will be accessible via their primary keys:
"scope": "read"
},
```
For an OAuth 2 token, the only fully mutable fields are `scope` and `description`. The `application`
field is *immutable on update*, and all other fields are totally immutable, and will be auto-populated
during creation
* `user` field corresponds to the user the token is created for
For an OAuth 2 token, the only fully mutable fields are `scope` and `description`. The `application`
field is *immutable on update*, and all other fields are totally immutable, and will be auto-populated
during creation.
* `user` - this field corresponds to the user the token is created for
* `expires` will be generated according to Tower configuration setting `OAUTH2_PROVIDER`
* `token` and `refresh_token` will be auto-generated to be non-clashing random strings.
Both application tokens and personal access tokens will be shown at the `/api/v2/tokens/`
Both application tokens and personal access tokens will be shown at the `/api/v2/tokens/`
endpoint. Personal access tokens can be identified by the `application` field being `null`.
On RBAC side:
**On RBAC side:**
- A user will be able to create a token if they are able to see the related application;
- System admin is able to see and manipulate every token in the system;
- The System Administrator is able to see and manipulate every token in the system;
- Organization admins will be able to see and manipulate all tokens belonging to Organization
members;
System Auditors can see all tokens and applications
@@ -196,7 +197,7 @@ curl -H "Authorization: Bearer kqHqxfpHGRRBXLNCOXxT5Zt3tpJogn" http://<awx>/api/
According to OAuth 2 specification, users should be able to acquire, revoke and refresh an access
token. In AWX the equivalent, and easiest, way of doing that is creating a token, deleting
a token, and deleting a token quickly followed by creating a new one.
a token, and deleting a token quickly followed by creating a new one.
The specification also provides standard ways of doing this. RFC 6749 elaborates
on those topics, but in summary, an OAuth 2 token is officially acquired via authorization using
@@ -211,7 +212,9 @@ endpoints under `/api/o/` endpoint. Detailed examples on the most typical usage
are available as description text of `/api/o/`. See below for information on Application Access Token usage.
> Note: The `/api/o/` endpoints can only be used for application tokens, and are not valid for personal access tokens.
#### Token scope mask over RBAC system
#### Token Scope Mask Over RBAC System
The scope of an OAuth 2 token is a space-separated string composed of keywords like 'read' and 'write'.
These keywords are configurable and used to specify permission level of the authenticated API client.
For the initial OAuth 2 implementation, we use the most simple scope configuration, where the only
@@ -225,7 +228,7 @@ For example, if a user has admin permission to a job template, he/she can both s
and delete the job template if authenticated via session or basic auth. On the other hand, if the user
is authenticated using OAuth 2 token, and the related token scope is 'read', the user can only see but
not manipulate or launch the job template, despite being an admin. If the token scope is
'write' or 'read write', she can take full advantage of the job template as its admin. Note, that 'write'
'write' or 'read write', she can take full advantage of the job template as its admin. Note that 'write'
implies 'read' as well.
@@ -235,14 +238,15 @@ This page lists OAuth 2 utility endpoints used for authorization, token refresh
Note endpoints other than `/api/o/authorize/` are not meant to be used in browsers and do not
support HTTP GET. The endpoints here strictly follow
[RFC specs for OAuth 2](https://tools.ietf.org/html/rfc6749), so please use that for detailed
reference. Here we give some examples to demonstrate the typical usage of these endpoints in
AWX context (Note AWX net location default to `http://localhost:8013` in examples):
reference. Below are some examples to demonstrate the typical usage of these endpoints in
AWX context (note that the AWX net location defaults to `http://localhost:8013` in these examples).
#### Application using `authorization code` grant type
#### Application Using `authorization code` Grant Type
This application grant type is intended to be used when the application is executing on the server. To create
an application named `AuthCodeApp` with the `authorization-code` grant type,
Make a POST to the `/api/v2/applications/` endpoint.
an application named `AuthCodeApp` with the `authorization-code` grant type,
make a POST to the `/api/v2/applications/` endpoint:
```text
{
"name": "AuthCodeApp",
@@ -253,21 +257,22 @@ Make a POST to the `/api/v2/applications/` endpoint.
"skip_authorization": false
}
```
You can test the authorization flow out with this new application by copying the client_id and URI link into the
homepage [here](http://django-oauth-toolkit.herokuapp.com/consumer/) and click submit. This is just a simple test
application Django-oauth-toolkit provides.
You can test the authorization flow out with this new application by copying the `client_id` and URI link into the
homepage [here](http://django-oauth-toolkit.herokuapp.com/consumer/) and click submit. This is just a simple test
application `Django-oauth-toolkit` provides.
From the client app, the user makes a GET to the Authorize endpoint with the `response_type`,
From the client app, the user makes a GET to the Authorize endpoint with the `response_type`,
`client_id`, `redirect_uris`, and `scope`. AWX will respond with the authorization `code` and `state`
to the redirect_uri specified in the application. The client application will then make a POST to the
`api/o/token/` endpoint on AWX with the `code`, `client_id`, `client_secret`, `grant_type`, and `redirect_uri`.
to the `redirect_uri` specified in the application. The client application will then make a POST to the
`api/o/token/` endpoint on AWX with the `code`, `client_id`, `client_secret`, `grant_type`, and `redirect_uri`.
AWX will respond with the `access_token`, `token_type`, `refresh_token`, and `expires_in`. For more
information on testing this flow, refer to [django-oauth-toolkit](http://django-oauth-toolkit.readthedocs.io/en/latest/tutorial/tutorial_01.html#test-your-authorization-server).
#### Application using `password` grant type
#### Application Using `password` Grant Type
This is also called the `resource owner credentials grant`. This is for use by users who have
native access to the web app. This should be used when the client is the Resource owner. Suppose
native access to the web app. This should be used when the client is the Resource owner. Suppose
we have an application `Default Application` with grant type `password`:
```text
{
@@ -285,7 +290,7 @@ we have an application `Default Application` with grant type `password`:
}
```
Log in is not required for `password` grant type, so we can simply use `curl` to acquire a personal access token
Login is not required for `password` grant type, so we can simply use `curl` to acquire a personal access token
via `/api/o/token/`:
```bash
curl -X POST \
@@ -294,12 +299,12 @@ curl -X POST \
IaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
http://<awx>/api/o/token/ -i
```
In the above post request, parameters `username` and `password` are username and password of the related
In the above POST request, parameters `username` and `password` are the username and password of the related
AWX user of the underlying application, and the authentication information is of format
`<client_id>:<client_secret>`, where `client_id` and `client_secret` are the corresponding fields of
underlying application.
Upon success, access token, refresh token and other information are given in the response body in JSON
Upon success, the access token, refresh token and other information are given in the response body in JSON
format:
```text
HTTP/1.1 200 OK
@@ -317,9 +322,11 @@ Strict-Transport-Security: max-age=15768000
{"access_token": "9epHOqHhnXUcgYK8QanOmUQPSgX92g", "token_type": "Bearer", "expires_in": 315360000000, "refresh_token": "jMRX6QvzOTf046KHee3TU5mT3nyXsz", "scope": "read"}
```
## Token Functions
#### Refresh an existing access token
#### Refresh an Existing Access Token
Suppose we have an existing access token with refresh token provided:
```text
{
@@ -334,14 +341,14 @@ Suppose we have an existing access token with refresh token provided:
"scope": "read write"
}
```
The `/api/o/token/` endpoint is used for refreshing access token:
The `/api/o/token/` endpoint is used for refreshing the access token:
```bash
curl -X POST \
-d "grant_type=refresh_token&refresh_token=AL0NK9TTpv0qp54dGbC4VUZtsZ9r8z" \
-u "gwSPoasWSdNkMDtBN3Hu2WYQpPWCO9SwUEsKK22l:fI6ZpfocHYBGfm1tP92r0yIgCyfRdDQt0Tos9L8a4fNsJjQQMwp9569eIaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
http://<awx>/api/o/token/ -i
```
In the above post request, `refresh_token` is provided by `refresh_token` field of the access token
In the above POST request, `refresh_token` is provided by `refresh_token` field of the access token
above. The authentication information is of format `<client_id>:<client_secret>`, where `client_id`
and `client_secret` are the corresponding fields of underlying related application of the access token.
@@ -364,12 +371,14 @@ Strict-Transport-Security: max-age=15768000
```
Internally, the refresh operation deletes the existing token and a new token is created immediately
after, with information like scope and related application identical to the original one. We can
verify by checking the new token is present and the old token is deleted at the /api/v2/tokens/ endpoint.
verify by checking the new token is present and the old token is deleted at the `/api/v2/tokens/` endpoint.
#### Revoke an access token
##### Alternatively Revoke using the /api/o/revoke-token/ endpoint
Revoking an access token by this method is the same as deleting the token resource object, but it allows you to delete a token by providing its token value, and the associated `client_id` (and `client_secret` if the application is `confidential`). For example:
#### Revoke an Access Token
##### Alternatively Revoke Using the /api/o/revoke-token/ Endpoint
Revoking an access token by this method is the same as deleting the token resource object, but it allows you to delete a token by providing its token value, and the associated `client_id` (and `client_secret` if the application is `confidential`). For example:
```bash
curl -X POST -d "token=rQONsve372fQwuc2pn76k3IHDCYpi7" \
-u "gwSPoasWSdNkMDtBN3Hu2WYQpPWCO9SwUEsKK22l:fI6ZpfocHYBGfm1tP92r0yIgCyfRdDQt0Tos9L8a4fNsJjQQMwp9569eIaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
@@ -377,17 +386,12 @@ curl -X POST -d "token=rQONsve372fQwuc2pn76k3IHDCYpi7" \
```
`200 OK` means a successful delete.
We can verify the effect by checking if the token is no longer present
at /api/v2/tokens/.
We can verify the effect by checking if the token is no longer present
at `/api/v2/tokens/`.
## Acceptance Criteria
* All CRUD operations for OAuth 2 applications and tokens should function as described.
* RBAC rules applied to OAuth 2 applications and tokens should behave as described.
* A default application should be auto-created for each new user.
@@ -396,4 +400,4 @@ at /api/v2/tokens/.
* Token scope mask over RBAC should work as described.
* Tower configuration setting `OAUTH2_PROVIDER` should be configurable and function as described.
* `/api/o/` endpoint should work as expected. In specific, all examples given in the description
help text should be working (user following the steps should get expected result).
help text should be working (a user following the steps should get expected result).

View File

@@ -1,20 +1,23 @@
# SAML
Security Assertion Markup Language, or SAML, is an open standard for exchanging authentication and/or authorization data between an identity provider (i.e. LDAP) and a service provider (i.e. AWX). More concretely, AWX can be configured to talk with SAML in order to authenticate (create/login/logout) users of AWX. User Team and Organization membership can be embedded in the SAML response to AWX.
Security Assertion Markup Language, or SAML, is an open standard for exchanging authentication and/or authorization data between an identity provider (*i.e.*, LDAP) and a service provider (*i.e.*, AWX). More concretely, AWX can be configured to talk with SAML in order to authenticate (create/login/logout) users of AWX. User Team and Organization membership can be embedded in the SAML response to AWX.
# Configure SAML Authentication
Please see the Tower documentation as well as Ansible blog posts for basic SAML configuration. Note that AWX's SAML implementation relies on python-social-auth which uses python-saml. AWX exposes 3 fields that are directly passed to the lower libraries:
Please see the [Tower documentation](https://docs.ansible.com/ansible-tower/latest/html/administration/ent_auth.html#saml-authentication-settings) as well as the [Ansible blog post](https://www.ansible.com/blog/using-saml-with-red-hat-ansible-tower) for basic SAML configuration. Note that AWX's SAML implementation relies on `python-social-auth` which uses `python-saml`. AWX exposes three fields which are directly passed to the lower libraries:
* `SOCIAL_AUTH_SAML_SP_EXTRA` is passed to the `python-saml` library configuration's `sp` setting.
* `SOCIAL_AUTH_SAML_SECURITY_CONFIG` is passed to the `python-saml` library configuration's `security` setting.
* `SOCIAL_AUTH_SAML_EXTRA_DATA`
See http://python-social-auth-docs.readthedocs.io/en/latest/backends/saml.html#advanced-settings for more information.
# Configure SAML for Team and Organization Membership
AWX can be configured to look for particular attributes that contain AWX Team and Organization membership to associate with users when they login to AWX. The attribute names are defined in AWX settings. Specifically, the authentication settings tab and SAML sub category fields *SAML Team Attribute Mapping* and *SAML Organization Attribute Mapping*. The meaning and usefulness of these settings is best motivated through example.
**Example SAML Organization Attribute Mapping**
# Configure SAML for Team and Organization Membership
AWX can be configured to look for particular attributes that contain AWX Team and Organization membership to associate with users when they log in to AWX. The attribute names are defined in AWX settings. Specifically, the authentication settings tab and SAML sub category fields *SAML Team Attribute Mapping* and *SAML Organization Attribute Mapping*. The meaning and usefulness of these settings is best communicated through example.
### Example SAML Organization Attribute Mapping
Below is an example SAML attribute that embeds user organization membership in the attribute *member-of*.
```
<saml2:AttributeStatement>
<saml2:AttributeStatement>
<saml2:Attribute FriendlyName="member-of" Name="member-of" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:unspecified">
<saml2:AttributeValue>Engineering</saml2:AttributeValue>
<saml2:AttributeValue>IT</saml2:AttributeValue>
@@ -25,9 +28,9 @@ Below is an example SAML attribute that embeds user organization membership in t
<saml2:AttributeValue>IT</saml2:AttributeValue>
<saml2:AttributeValue>HR</saml2:AttributeValue>
</saml2:Attribute>
</saml2:AttributeStatement>
</saml2:AttributeStatement>
```
Below, the corresponding AWX configuration.
Below, the corresponding AWX configuration:
```
{
"saml_attr": "member-of",
@@ -36,16 +39,16 @@ Below, the corresponding AWX configuration.
'remove_admins': true
}
```
**saml_attr:** The saml attribute name where the organization array can be found.
**saml_attr:** The SAML attribute name where the organization array can be found.
**remove:** True to remove user from all organizations before adding the user to the list of Organizations. False to keep the user in whatever Organization(s) they are in while adding the user to the Organization(s) in the SAML attribute.
**remove:** Set this to `true` to remove a user from all organizations before adding the user to the list of Organizations. Set it to `false` to keep the user in whatever Organization(s) they are in while adding the user to the Organization(s) in the SAML attribute.
**saml_admin_attr:** The saml attribute name where the organization administrators array can be found.
**saml_admin_attr:** The SAML attribute name where the organization administrators' array can be found.
**remove_admins:** True to remove user from all organizations that it is admin before adding the user to the list of Organizations admins. False to keep the user in whatever Organization(s) they are in as admin while adding the user as an Organization administrator in the SAML attribute.
**remove_admins:** Set this to `true` to remove a user from all organizations that they are administrators of before adding the user to the list of Organizations admins. Set it to `false` to keep the user in whatever Organization(s) they are in as admin while adding the user as an Organization administrator in the SAML attribute.
**Example SAML Team Attribute Mapping**
Below is another example of a SAML attribute that contains a Team membership in a list.
### Example SAML Team Attribute Mapping
Below is another example of a SAML attribute that contains a Team membership in a list:
```
<saml:AttributeStatement>
<saml:Attribute
@@ -78,8 +81,8 @@ Below is another example of a SAML attribute that contains a Team membership in
]
}
```
**saml_attr:** The saml attribute name where the team array can be found.
**saml_attr:** The SAML attribute name where the team array can be found.
**remove:** True to remove user from all Teams before adding the user to the list of Teams. False to keep the user in whatever Team(s) they are in while adding the user to the Team(s) in the SAML attribute.
**remove:** Set this to `true` to remove user from all Teams before adding the user to the list of Teams. Set this to `false` to keep the user in whatever Team(s) they are in while adding the user to the Team(s) in the SAML attribute.
**team_org_map:** An array of dictionaries of the form `{ "team": "<AWX Team Name>", "organization": "<AWX Org Name>" }` that defines mapping from AWX Team -> AWX Organization. This is needed because the same named Team can exist in multiple Organizations in Tower. The organization to which a team listed in a SAML attribute belongs to would be ambiguous without this mapping.
**team_org_map:** An array of dictionaries of the form `{ "team": "<AWX Team Name>", "organization": "<AWX Org Name>" }` which defines mapping from AWX Team -> AWX Organization. This is needed because the same named Team can exist in multiple Organizations in Tower. The organization to which a team listed in a SAML attribute belongs to would be ambiguous without this mapping.

View File

@@ -1,7 +1,7 @@
## Ansible Tower Capacity Determination and Job Impact
The Ansible Tower capacity system determines how many jobs can run on an Instance given the amount of resources
available to the Instance and the size of the jobs that are running (referred herafter as `Impact`).
available to the Instance and the size of the jobs that are running (referred to hereafter as `Impact`).
The algorithm used to determine this is based entirely on two things:
* How much memory is available to the system (`mem_capacity`)
@@ -11,72 +11,74 @@ Capacity also impacts Instance Groups. Since Groups are composed of Instances, l
assigned to multiple Groups. This means that impact to one Instance can potentially affect the overall capacity of
other Groups.
Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see clustering.md).
When the Task Manager is preparing its graph to determine which Group a Job will run on it will commit the capacity of
an Instance Group to a job that hasn't or isn't ready to start yet. (see task_manager_system.md)
Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see [Tower Clustering/HA Overview](https://github.com/ansible/awx/blob/devel/docs/clustering.md)).
When the Task Manager is preparing its graph to determine which Group a Job will run on, it will commit the capacity of
an Instance Group to a Job that hasn't or isn't ready to start yet (see [Task Manager Overview](https://github.com/ansible/awx/blob/devel/docs/task_manager_system.md)).
Finally, if only one Instance is available, in smaller configurations, for a Job to run the Task Manager will allow that
Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that Jobs
themselves won't get clogged as a result of an under provisioned system.
Finally, if only one Instance is available (especially in smaller configurations) for a Job to run, the Task Manager will allow that
Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that jobs
themselves won't get clogged as a result of an under-provisioned system.
These concepts mean that, in general, Capacity and Impact is not a zero-sum system relative to Jobs and Instances/Instance Groups.
These concepts mean that, in general, Capacity and Impact is not a zero-sum system relative to Jobs and Instances/Instance Groups
### Resource Determination For Capacity Algorithm
The capacity algorithms are defined in order to determine how many `forks` a system is capable of running simultaneously. This controls how
The capacity algorithms are defined in order to determine how many `forks` a system is capable of running at the same time. This controls how
many systems Ansible itself will communicate with simultaneously. Increasing the number of forks a Tower system is running will, in general,
allow jobs to run faster by performing more work in parallel. The tradeoff is that will increase the load on the system which could cause work
allow jobs to run faster by performing more work in parallel. The tradeoff is that this will increase the load on the system which could cause work
to slow down overall.
Tower can operate in two modes when determining capacity. `mem_capacity` (the default) will allow you to overcommit CPU resources while protecting the system
from running out of memory. If most of your work is not cpu-bound then selecting this mode will maximize the number of forks.
from running out of memory. If most of your work is not CPU-bound, then selecting this mode will maximize the number of forks.
#### Memory Relative Capacity
`mem_capacity` is calculated relative to the amount of memory needed per-fork. Taking into account the overhead for Tower's internal components this comes out
to be about `100MB` per-fork. When considering the amount of memory available to Ansible jobs the capacity algorithm will reserve 2GB of memory to account
`mem_capacity` is calculated relative to the amount of memory needed per-fork. Taking into account the overhead for Tower's internal components, this comes out
to be about `100MB` per fork. When considering the amount of memory available to Ansible jobs the capacity algorithm will reserve 2GB of memory to account
for the presence of other Tower services. The algorithm itself looks like this:
(mem - 2048) / mem_per_fork
As an example:
(4096 - 2048) / 100 == ~20
So a system with 4GB of memory would be capable of running 20 forks. The value `mem_per_fork` can be controlled by setting the Tower settings value
(or environment variable) `SYSTEM_TASK_FORKS_MEM` which defaults to `100`.
#### CPU Relative Capacity
Often times Ansible workloads can be fairly cpu-bound. In these cases sometimes reducing the simultaneous workload allows more tasks to run faster and reduces
#### CPU-Relative Capacity
Often times Ansible workloads can be fairly CPU-bound. In these cases, sometimes reducing the simultaneous workload allows more tasks to run faster and reduces
the average time-to-completion of those jobs.
Just as the Tower `mem_capacity` algorithm uses the amount of memory need per-fork, the `cpu_capacity` algorithm looks at the amount of cpu resources is needed
per fork. The baseline value for this is `4` forks per-core. The algorithm itself looks like this:
Just as the Tower `mem_capacity` algorithm uses the amount of memory needed per-fork, the `cpu_capacity` algorithm looks at the amount of CPU resources is needed
per fork. The baseline value for this is `4` forks per core. The algorithm itself looks like this:
cpus * fork_per_cpu
For example a 4-core system:
For example, in a 4-core system:
4 * 4 == 16
The value `fork_per_cpu` can be controlled by setting the Tower settings value (or environment variable) `SYSTEM_TASK_FORKS_CPU` which defaults to `4`.
The value `fork_per_cpu` can be controlled by setting the Tower settings value (or environment variable) `SYSTEM_TASK_FORKS_CPU`, which defaults to `4`.
### Job Impacts Relative To Capacity
When selecting the capacity it's important to understand how each job type affects capacity.
When selecting the capacity, it's important to understand how each job type affects it.
It's helpful to understand what `forks` mean to Ansible: http://docs.ansible.com/ansible/latest/intro_configuration.html#forks
The default forks value for ansible is `5`. However, if Tower knows that you're running against fewer systems than that then the actual concurrency value
The default forks value for ansible is `5`. However, if Tower knows that you're running against fewer systems than that, then the actual concurrency value
will be lower.
When a job is run, Tower will add `1` to the number of forks selected to compensate for the Ansible parent process. So if you are running a playbook against `5`
systems with a `forks` value of `5` then the actual `forks` value from the perspective of Job Impact will be 6.
When a job is made to run, Tower will add `1` to the number of forks selected to compensate for the Ansible parent process. So if you are running a playbook against `5`
systems with a `forks` value of `5`, then the actual `forks` value from the perspective of Job Impact will be 6.
#### Impact of Job types in Tower
#### Impact of Job Types in Tower
Jobs and Ad-hoc jobs follow the above model `forks + 1`.
Jobs and Ad-hoc jobs follow the above model, `forks + 1`.
Other job types have a fixed impact:
@@ -84,16 +86,15 @@ Other job types have a fixed impact:
* Project Updates: 1
* System Jobs: 5
### Selecting the right capacity
### Selecting the Right Capacity
Selecting between a `memory` focused capacity algorithm and a `cpu` focused capacity for your Tower use means you'll be selecting between a minimum
and maximum value. In the above examples the CPU capacity would allow a maximum of 16 forks while the Memory capacity would allow 20. For some systems
the disparity between these can be large and often times you may want to have a balance between these two.
Selecting between a memory-focused capacity algorithm and a CPU-focused capacity for your Tower use means you'll be selecting between a minimum
and maximum value. In the above examples, the CPU capacity would allow a maximum of 16 forks while the Memory capacity would allow 20. For some systems,
the disparity between these can be large and oftentimes you may want to have a balance between these two.
An `Instance` field `capacity_adjustment` allows you to select how much of one or the other you want to consider. It is represented as a value between 0.0
and 1.0. If set to a value of `1.0` then the largest value will be used. In the above example, that would be Memory capacity so a value of `20` forks would
An Instance field, `capacity_adjustment`, allows you to select how much of one or the other you want to consider. It is represented as a value between `0.0`
and `1.0`. If set to a value of `1.0`, then the largest value will be used. In the above example, that would be Memory capacity, so a value of `20` forks would
be selected. If set to a value of `0.0` then the smallest value will be used. A value of `0.5` would be a 50/50 balance between the two algorithms which would
be `18`:
16 + (20 - 16) * 0.5 == 18

View File

@@ -85,9 +85,9 @@ hostC rabbitmq_host=10.1.0.3
- `rabbitmq_use_long_names` - RabbitMQ is pretty sensitive to what each instance is named. We are flexible enough to allow FQDNs (_host01.example.com_), short names (`host01`), or IP addresses (192.168.5.73). Depending on what is used to identify each host in the `inventory` file, this value may need to be changed. For FQDNs and IP addresses, this value needs to be `true`. For short names it should be `false`
- `rabbitmq_enable_manager` - Setting this to `true` will expose the RabbitMQ management web console on each instance.
The most important field to point out for variability is `rabbitmq_use_long_name`. This cannot be detected and no reasonable default is provided for it, so it's important to point out when it needs to be changed. If instances are provisioned to where they reference other instances internally and not on external addresses then `rabbitmq_use_long_name` semantics should follow the internal addressing (aka `rabbitmq_host`).
The most important field to point out for variability is `rabbitmq_use_long_name`. This cannot be detected and no reasonable default is provided for it, so it's important to point out when it needs to be changed. If instances are provisioned to where they reference other instances internally and not on external addresses, then `rabbitmq_use_long_name` semantics should follow the internal addressing (*i.e.*, `rabbitmq_host`).
Other than `rabbitmq_use_long_name` the defaults are pretty reasonable:
Other than `rabbitmq_use_long_name`, the defaults are pretty reasonable:
```
rabbitmq_port=5672
rabbitmq_vhost=tower
@@ -105,9 +105,9 @@ Recommendations and constraints:
- Do not name any instance the same as a group name.
### Security Isolated Rampart Groups
### Security-Isolated Rampart Groups
In Tower versions 3.2+ customers may optionally define isolated groups inside of security-restricted networking zones from which to run jobs and ad hoc commands. Instances in these groups will _not_ have a full install of Tower, but will have a minimal set of utilities used to run jobs. Isolated groups must be specified in the inventory file prefixed with `isolated_group_`. An example inventory file is shown below:
In Tower versions 3.2+, customers may optionally define isolated groups inside of security-restricted networking zones from which to run jobs and ad hoc commands. Instances in these groups will _not_ have a full install of Tower, but will have a minimal set of utilities used to run jobs. Isolated groups must be specified in the inventory file prefixed with `isolated_group_`. An example inventory file is shown below:
```
[tower]
@@ -154,18 +154,18 @@ Recommendations for system configuration with isolated groups:
Isolated Instance Authentication
--------------------------------
By default - at installation time - a randomized RSA key is generated and distributed as an authorized key to all "isolated" instances. The private half of the key is encrypted and stored within Tower, and is used to authenticat from "controller" instances to "isolated" instances when jobs are run.
At installation time, by default, a randomized RSA key is generated and distributed as an authorized key to all "isolated" instances. The private half of the key is encrypted and stored within Tower, and is used to authenticate from "controller" instances to "isolated" instances when jobs are run.
For users who wish to manage SSH authentication from controlling instances to isolated instances via some system _outside_ of Tower (such as externally-managed passwordless SSH keys), this behavior can be disabled by unsetting two Tower API settings values:
For users who wish to manage SSH authentication from controlling instances to isolated instances via some system _outside_ of Tower (such as externally-managed, password-less SSH keys), this behavior can be disabled by unsetting two Tower API settings values:
`HTTP PATCH /api/v2/settings/jobs/ {'AWX_ISOLATED_PRIVATE_KEY': '', 'AWX_ISOLATED_PUBLIC_KEY': ''}`
### Provisioning and Deprovisioning Instances and Groups
* **Provisioning** - Provisioning Instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contain all passwords and information used when installing the cluster or other instances may be reconfigured (this could be intentional).
* **Provisioning** - Provisioning Instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contain all passwords and information used when installing the cluster, or other instances may be reconfigured (this can be done intentionally).
* **Deprovisioning** - Tower does not automatically de-provision instances since it cannot distinguish between an instance that was taken offline intentionally or due to failure. Instead the procedure for deprovisioning an instance is to shut it down (or stop the `ansible-tower-service`) and run the Tower deprovision command:
* **Deprovisioning** - Tower does not automatically de-provision instances since it cannot distinguish between an instance that was taken offline intentionally or due to failure. Instead, the procedure for de-provisioning an instance is to shut it down (or stop the `ansible-tower-service`) and run the Tower de-provision command:
```
$ awx-manage deprovision_instance --hostname=<hostname>
@@ -179,7 +179,7 @@ $ awx-manage unregister_queue --queuename=<name>
### Configuring Instances and Instance Groups from the API
Instance Groups can be created by posting to `/api/v2/instance_groups` as a System Admin.
Instance Groups can be created by posting to `/api/v2/instance_groups` as a System Administrator.
Once created, `Instances` can be associated with an Instance Group with:
@@ -205,12 +205,13 @@ Instance Group Policies are controlled by three optional fields on an `Instance
* `Instances` that are assigned directly to `Instance Groups` by posting to `/api/v2/instance_groups/x/instances` or `/api/v2/instances/x/instance_groups` are automatically added to the `policy_instance_list`. This means they are subject to the normal caveats for `policy_instance_list` and must be manually managed.
* `policy_instance_percentage` and `policy_instance_minimum` work together. For example, if you have a `policy_instance_percentage` of 50% and a `policy_instance_minimum` of 2 and you start 6 `Instances`, 3 of them would be assigned to the `Instance Group`. If you reduce the number of `Instances` to 2 then both of them would be assigned to the `Instance Group` to satisfy `policy_instance_minimum`. In this way, you can set a lower bound on the amount of available resources.
* `policy_instance_percentage` and `policy_instance_minimum` work together. For example, if you have a `policy_instance_percentage` of 50% and a `policy_instance_minimum` of 2 and you start 6 `Instances`, 3 of them would be assigned to the `Instance Group`. If you reduce the number of `Instances` to 2, then both of them would be assigned to the `Instance Group` to satisfy `policy_instance_minimum`. In this way, you can set a lower bound on the amount of available resources.
* Policies don't actively prevent `Instances` from being associated with multiple `Instance Groups` but this can effectively be achieved by making the percentages sum to 100. If you have 4 `Instance Groups`, assign each a percentage value of 25 and the `Instances` will be distributed among them with no overlap.
### Manually Pinning Instances to Specific Groups
If you have a special `Instance` which needs to be _exclusively_ assigned to a specific `Instance Group` but don't want it to automatically join _other_ groups via "percentage" or "minimum" policies:
1. Add the `Instance` to one or more `Instance Group`s' `policy_instance_list`.
@@ -243,6 +244,7 @@ Tower itself reports as much status as it can via the API at `/api/v2/ping` in o
A more detailed view of Instances and Instance Groups, including running jobs and membership
information can be seen at `/api/v2/instances/` and `/api/v2/instance_groups`.
### Instance Services and Failure Behavior
Each Tower instance is made up of several different services working collaboratively:
@@ -253,14 +255,14 @@ Each Tower instance is made up of several different services working collaborati
* **RabbitMQ** - A Message Broker, this is used as a signaling mechanism for Celery as well as any event data propagated to the application.
* **Memcached** - A local caching service for the instance it lives on.
Tower is configured in such a way that if any of these services or their components fail, then all services are restarted. If these fail sufficiently often in a short span of time, then the entire instance will be placed offline in an automated fashion in order to allow remediation without causing unexpected behavior.
Tower is configured in such a way that if any of these services or their components fail, then all services are restarted. If these fail sufficiently (often in a short span of time), then the entire instance will be placed offline in an automated fashion in order to allow remediation without causing unexpected behavior.
### Job Runtime Behavior
Ideally a regular user of Tower should not notice any semantic difference to the way jobs are run and reported. Behind the scenes it is worth pointing out the differences in how the system behaves.
When a job is submitted from the API interface it gets pushed into the Celery queue on RabbitMQ. A single RabbitMQ instance is the responsible master for individual queues, but each Tower instance will connect to and receive jobs from that queue using a Fair scheduling algorithm. Any instance on the cluster is just as likely to receive the work and execute the task. If an instance fails while executing jobs, then the work is marked as permanently failed.
When a job is submitted from the API interface, it gets pushed into the Dispatcher queue on RabbitMQ. A single RabbitMQ instance is the responsible master for individual queues, but each Tower instance will connect to and receive jobs from that queue using a fair-share scheduling algorithm. Any instance on the cluster is just as likely to receive the work and execute the task. If an instance fails while executing jobs, then the work is marked as permanently failed.
If a cluster is divided into separate Instance Groups, then the behavior is similar to the cluster as a whole. If two instances are assigned to a group then either one is just as likely to receive a job as any other in the same group.
@@ -270,60 +272,56 @@ It's important to note that not all instances are required to be provisioned wit
If an Instance Group is configured but all instances in that group are offline or unavailable, any jobs that are launched targeting only that group will be stuck in a waiting state until instances become available. Fallback or backup resources should be provisioned to handle any work that might encounter this scenario.
#### Project synchronization behavior
#### Project Synchronization Behavior
Project updates behave differently than they did before. Previously they were ordinary jobs that ran on a single instance. It's now important that they run successfully on any instance that could potentially run a job. Projects will sync themselves to the correct version on the instance immediately prior to running the job. If the needed revision is already locally checked out and galaxy or collections updates are not needed, then a sync may not be performed.
Project updates behave differently than they did before. Previously they were ordinary jobs that ran on a single instance. It's now important that they run successfully on any instance that could potentially run a job. Projects will sync themselves to the correct version on the instance immediately prior to running the job. If the needed revision is already locally checked out and Galaxy or Collections updates are not needed, then a sync may not be performed.
When the sync happens, it is recorded in the database as a project update with a `launch_type` of "sync" and a `job_type` of "run". Project syncs will not change the status or version of the project; instead, they will update the source tree _only_ on the instance where they run. The only exception to this behavior is when the project is in the "never updated" state (meaning that no project updates of any type have been run), in which case a sync should fill in the project's initial revision and status, and subsequent syncs should not make such changes.
#### Controlling where a particular job runs
#### Controlling Where a Particular Job Runs
By default, a job will be submitted to the `tower` queue, meaning that it can be picked up by any of the workers.
##### How to restrict the instances a job will run on
##### How to Restrict the Instances a Job Will Run On
If any of the job template, inventory,
or organization has instance groups associated with them, a job run from that job template will not be eligible for the default behavior. That means that if all of the instance associated with these three resources are out of capacity, the job will remain in the `pending` state until capacity frees up.
If the Job Template, Inventory, or Organization have instance groups associated with them, a job run from that Job Template will not be eligible for the default behavior. This means that if all of the instance associated with these three resources are out of capacity, the job will remain in the `pending` state until capacity frees up.
##### How to set up a preferred instance group
##### How to Set Up a Preferred Instance Group
The order of preference in determining which instance group to which the job gets submitted is as follows:
The order of preference in determining which instance group the job gets submitted to is as follows:
1. Job Template
2. Inventory
3. Organization (by way of Inventory)
To expand further: If instance groups are associated with the job template and all of them are at capacity, then the job will be submitted to instance groups specified on inventory, and then organization.
To expand further: If instance groups are associated with the Job Template and all of them are at capacity, then the job will be submitted to instance groups specified on Inventory, and then Organization.
The global `tower` group can still be associated with a resource, just like any of the custom instance groups defined in the playbook. This can be used to specify a preferred instance group on the job template or inventory, but still allow the job to be submitted to any instance if those are out of capacity.
#### Instance Enable / Disable
In order to support temporarily taking an `Instance` offline there is a boolean property `enabled` defined on each instance.
In order to support temporarily taking an `Instance` offline, there is a boolean property `enabled` defined on each instance.
When this property is disabled no jobs will be assigned to that `Instance`. Existing jobs will finish but no new work will be
assigned.
When this property is disabled, no jobs will be assigned to that `Instance`. Existing jobs will finish but no new work will be assigned.
## Acceptance Criteria
When verifying acceptance we should ensure the following statements are true
When verifying acceptance, we should ensure that the following statements are true:
* Tower should install as a standalone Instance
* Tower should install in a Clustered fashion
* Instance should, optionally, be able to be grouped arbitrarily into different Instance Groups
* Capacity should be tracked at the group level and capacity impact should make sense relative to what instance a job is
running on and what groups that instance is a member of.
* Instances should, optionally, be able to be grouped arbitrarily into different Instance Groups
* Capacity should be tracked at the group level and capacity impact should make sense relative to what instance a job is running on and what groups that instance is a member of
* Provisioning should be supported via the setup playbook
* De-provisioning should be supported via a management command
* All jobs, inventory updates, and project updates should run successfully
* Jobs should be able to run on hosts which it is targeted. If assigned implicitly or directly to groups then it should
only run on instances in those Instance Groups.
* Jobs should be able to run on hosts for which they are targeted; if assigned implicitly or directly to groups, then they should only run on instances in those Instance Groups
* Project updates should manifest their data on the host that will run the job immediately prior to the job running
* Tower should be able to reasonably survive the removal of all instances in the cluster
* Tower should behave in a predictable fashiong during network partitioning
* Tower should behave in a predictable fashion during network partitioning
## Testing Considerations
@@ -331,39 +329,30 @@ When verifying acceptance we should ensure the following statements are true
* Basic playbook testing to verify routing differences, including:
- Basic FQDN
- Short-name name resolution
- ip addresses
- /etc/hosts static routing information
* We should test behavior of large and small clusters. I would envision small clusters as 2 - 3 instances and large
clusters as 10 - 15 instances
* Failure testing should involve killing single instances and killing multiple instances while the cluster is performing work.
Job failures during the time period should be predictable and not catastrophic.
* Instance downtime testing should also include recoverability testing. Killing single services and ensuring the system can
return itself to a working state
* Persistent failure should be tested by killing single services in such a way that the cluster instance cannot be recovered
and ensuring that the instance is properly taken offline
* Network partitioning failures will be important also. In order to test this
- IP addresses
- `/etc/hosts` static routing information
* We should test behavior of large and small clusters; small clusters usually consist of 2 - 3 instances and large clusters have 10 - 15 instances.
* Failure testing should involve killing single instances and killing multiple instances while the cluster is performing work. Job failures during the time period should be predictable and not catastrophic.
* Instance downtime testing should also include recoverability testing (killing single services and ensuring the system can return itself to a working state).
* Persistent failure should be tested by killing single services in such a way that the cluster instance cannot be recovered and ensuring that the instance is properly taken offline.
* Network partitioning failures will also be important. In order to test this:
- Disallow a single instance from communicating with the other instances but allow it to communicate with the database
- Break the link between instances such that it forms 2 or more groups where groupA and groupB can't communicate but all instances
can communicate with the database.
* Crucially when network partitioning is resolved all instances should recover into a consistent state
* Upgrade Testing, verify behavior before and after are the same for the end user.
* Project Updates should be thoroughly tested for all scm types (git, svn, hg) and for manual projects.
- Break the link between instances such that it forms two or more groups where Group A and Group B can't communicate but all instances can communicate with the database.
* Crucially, when network partitioning is resolved, all instances should recover into a consistent state.
* Upgrade Testing - verify behavior before and after are the same for the end user.
* Project Updates should be thoroughly tested for all SCM types (`git`, `svn`, `hg`) and for manual projects.
* Setting up instance groups in two scenarios:
a) instances are shared between groups
b) instances are isolated to particular groups
Organizations, Inventories, and Job Templates should be variously assigned to one or many groups and jobs should execute
in those groups in preferential order as resources are available.
Organizations, Inventories, and Job Templates should be variously assigned to one or many groups and jobs should execute in those groups in preferential order as resources are available.
## Performance Testing
Performance testing should be twofold.
Performance testing should be twofold:
* Large volume of simultaneous jobs.
* Jobs that generate a large amount of output.
* A large volume of simultaneous jobs
* Jobs that generate a large amount of output
These should also be benchmarked against the same playbooks using the 3.0.X Tower release and a stable Ansible version.
For a large volume playbook I might recommend a customer provided one that we've seen recently:
These should also be benchmarked against the same playbooks using the 3.0.X Tower release and a stable Ansible version. For a large volume playbook (*e.g.*, against 100+ hosts), something like the following is recommended:
https://gist.github.com/michelleperz/fe3a0eb4eda888221229730e34b28b89
Against 100+ hosts.

View File

@@ -1,19 +1,18 @@
## Collections
AWX supports using Ansible collections.
This section will give ways to use collections in job runs.
AWX supports the use of Ansible Collections. This section will give ways to use Collections in job runs.
### Project Collections Requirements
If you specify a collections requirements file in SCM at `collections/requirements.yml`,
then AWX will install collections in that file in the implicit project sync
If you specify a Collections requirements file in SCM at `collections/requirements.yml`,
then AWX will install Collections in that file in the implicit project sync
before a job run. The invocation is:
```
ansible-galaxy collection install -r requirements.yml -p <job tmp location>
```
Example of tmp directory where job is running:
Example of `tmp` directory where job is running:
```
├── project

View File

@@ -2,7 +2,7 @@ Credential Plugins
==================
By default, sensitive credential values (such as SSH passwords, SSH private
keys, API tokens for cloud services) in AWX are stored in the AWX database
keys, API tokens for cloud services, etc.) in AWX are stored in the AWX database
after being encrypted with a symmetric encryption cipher utilizing AES-256 in
CBC mode alongside a SHA-256 HMAC.
@@ -19,9 +19,9 @@ When configuring AWX to pull a secret from a third party system, there are
generally three steps.
Here is an example of creating an (1) AWX Machine Credential with
a static username, `example-user` and (2) an externally sourced secret from
a static username, `example-user` and (2) an externally-sourced secret from
HashiCorp Vault Key/Value system which will populate the (3) password field on
the Machine Credential.
the Machine Credential:
1. Create the Machine Credential with a static username, `example-user`.
@@ -29,13 +29,13 @@ the Machine Credential.
secret management system (in this example, specifying a URL and an
OAuth2.0 token _to access_ HashiCorp Vault)
3. _Link_ the `password` field for the Machine credential to the external
system by specifying the source (in this example, the HashiCorp credential)
3. _Link_ the `password` field for the Machine cCredential to the external
system by specifying the source (in this example, the HashiCorp Credential)
and metadata about the path (e.g., `/some/path/to/my/password/`).
Note that you can perform these lookups on *any* field for any non-external
credential, including those with custom credential types. You could just as
easily create an AWS credential and use lookups to retrieve the Access Key and
easily create an AWS Credential and use lookups to retrieve the Access Key and
Secret Key from an external secret management system. External credentials
cannot have lookups applied to their fields.
@@ -150,10 +150,10 @@ HashiCorp Vault KV
AWX supports retrieving secret values from HashiCorp Vault KV
(https://www.vaultproject.io/api/secret/kv/)
The following example illustrates how to configure a Machine credential to pull
its password from an HashiCorp Vault:
The following example illustrates how to configure a Machine Credential to pull
its password from a HashiCorp Vault:
1. Look up the ID of the Machine and HashiCorp Vault Secret Lookup credential
1. Look up the ID of the Machine and HashiCorp Vault Secret Lookup Credential
types (in this example, `1` and `15`):
```shell
@@ -182,7 +182,7 @@ HTTP/1.1 200 OK
...
```
2. Create a Machine and a HashiCorp Vault credential:
2. Create a Machine and a HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/" \
@@ -214,7 +214,7 @@ HTTP/1.1 201 Created
...
```
3. Link the Machine credential to the HashiCorp Vault credential:
3. Link the Machine Credential to the HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/1/input_sources/" \
@@ -232,10 +232,10 @@ HashiCorp Vault SSH Secrets Engine
AWX supports signing public keys via HashiCorp Vault's SSH Secrets Engine
(https://www.vaultproject.io/api/secret/ssh/)
The following example illustrates how to configure a Machine credential to sign
The following example illustrates how to configure a Machine Credential to sign
a public key using HashiCorp Vault:
1. Look up the ID of the Machine and HashiCorp Vault Signed SSH credential
1. Look up the ID of the Machine and HashiCorp Vault Signed SSH Credential
types (in this example, `1` and `16`):
```shell
@@ -263,7 +263,7 @@ HTTP/1.1 200 OK
"name": "HashiCorp Vault Signed SSH",
```
2. Create a Machine and a HashiCorp Vault credential:
2. Create a Machine and a HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/" \
@@ -295,7 +295,7 @@ HTTP/1.1 201 Created
...
```
3. Link the Machine credential to the HashiCorp Vault credential:
3. Link the Machine Credential to the HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/1/input_sources/" \
@@ -306,7 +306,7 @@ HTTP/1.1 201 Created
HTTP/1.1 201 Created
```
4. Associate the Machine credential with a Job Template. When the Job Template
4. Associate the Machine Credential with a Job Template. When the Job Template
is run, AWX will use the provided HashiCorp URL and token to sign the
unsigned public key data using the HashiCorp Vault SSH Secrets API.
AWX will generate an `id_rsa` and `id_rsa-cert.pub` on the fly and

View File

@@ -27,7 +27,7 @@ Important Changes
By utilizing these custom ``Credential Types``, customers have the ability to
define custom "Cloud" and "Network" ``Credential Types`` which
modify environment variables, extra vars, and generate file-based
credentials (such as file-based certificates or .ini files) at
credentials (such as file-based certificates or `.ini` files) at
`ansible-playbook` runtime.
* Multiple ``Credentials`` can now be assigned to a ``Job Template`` as long as
@@ -136,9 +136,10 @@ ordered fields for that type:
"multiline": false # if true, the field should be rendered
# as multi-line for input entry
# (only applicable to `type=string`)
"default": "default value" # optional, can be used to provide a
# default value if the field is left empty
# when creating a credential of this type
# default value if the field is left empty;
# when creating a credential of this type,
# credential forms will use this value
# as a prefill when making credentials of
# this type
@@ -164,7 +165,7 @@ When `type=string`, fields can optionally specify multiple choice options:
Defining Custom Credential Type Injectors
-----------------------------------------
A ``Credential Type`` can inject ``Credential`` values through the use
of the Jinja templating language (which should be familiar to users of Ansible):
of the [Jinja templating language](https://jinja.palletsprojects.com/en/2.10.x/) (which should be familiar to users of Ansible):
"injectors": {
"env": {
@@ -175,7 +176,7 @@ of the Jinja templating language (which should be familiar to users of Ansible):
}
}
``Credential Types`` can also generate temporary files to support .ini files or
``Credential Types`` can also generate temporary files to support `.ini` files or
certificate/key data:
"injectors": {
@@ -274,7 +275,7 @@ Additional Criteria
Acceptance Criteria
-------------------
When verifying acceptance we should ensure the following statements are true:
When verifying acceptance, the following statements should be true:
* `Credential` injection for playbook runs, SCM updates, inventory updates, and
ad-hoc runs should continue to function as they did prior to Tower 3.2 for the
@@ -290,15 +291,15 @@ When verifying acceptance we should ensure the following statements are true:
* Users should not be able to use the syntax for injecting single and
multiple files in the same custom credential.
* The default `Credential Types` included with Tower in 3.2 should be
non-editable/readonly and cannot be deleted by any user.
non-editable/read-only and unable to be deleted by any user.
* Stored `Credential` values for _all_ types should be consistent before and
after Tower 3.2 migration/upgrade.
after a Tower 3.2 migration/upgrade.
* `Job Templates` should be able to specify multiple extra `Credentials` as
defined in the constraints in this document.
* Custom inventory sources should be able to specify a cloud/network
`Credential` and they should properly update the environment (environment
variables, extra vars, written files) when an inventory source update runs.
* If a `Credential Type` is being used by one or more `Credentials`, the fields
defined in its ``inputs`` should be read-only.
* `Credential Types` should support activity stream history for basic object
defined in its `inputs` should be read-only.
* `Credential Types` should support Activity Stream history for basic object
modification.

View File

@@ -1,233 +0,0 @@
Multi-Credential Assignment
===========================
awx has added support for assigning zero or more credentials to
JobTemplates and InventoryUpdates via a singular, unified interface.
Background
----------
Prior to awx (Tower 3.2), Job Templates had a certain set of requirements
surrounding their relation to Credentials:
* All Job Templates (and Jobs) were required to have exactly *one* Machine/SSH
or Vault credential (or one of both).
* All Job Templates (and Jobs) could have zero or more "extra" Credentials.
* These extra Credentials represented "Cloud" and "Network" credentials that
* could be used to provide authentication to external services via environment
* variables (e.g., AWS_ACCESS_KEY_ID).
This model required a variety of disjoint interfaces for specifying Credentials
on a JobTemplate. For example, to modify assignment of Machine/SSH and Vault
credentials, you would change the Credential key itself:
`PATCH /api/v2/job_templates/N/ {'credential': X, 'vault_credential': Y}`
Modifying `extra_credentials` was accomplished on a separate API endpoint
via association/disassociation actions:
```
POST /api/v2/job_templates/N/extra_credentials {'associate': true, 'id': Z}
POST /api/v2/job_templates/N/extra_credentials {'disassociate': true, 'id': Z}
```
This model lacked the ability associate multiple Vault credentials with
a playbook run, a use case supported by Ansible core from Ansible 2.4 onwards.
This model also was a stumbling block for certain playbook execution workflows.
For example, some users wanted to run playbooks with `connection:local` that
only interacted with some cloud service via a cloud Credential. In this
scenario, users often generated a "dummy" Machine/SSH Credential to attach to
the Job Template simply to satisfy the requirement on the model.
Important Changes
-----------------
JobTemplates now have a single interface for Credential assignment:
`GET /api/v2/job_templates/N/credentials/`
Users can associate and disassociate credentials using `POST` requests to this
interface, similar to the behavior in the now-deprecated `extra_credentials`
endpoint:
```
POST /api/v2/job_templates/N/credentials/ {'associate': true, 'id': X'}
POST /api/v2/job_templates/N/credentials/ {'disassociate': true, 'id': Y'}
```
Under this model, a JobTemplate is considered valid even when it has _zero_
Credentials assigned to it.
Launch Time Considerations
--------------------------
Prior to this change, JobTemplates had a configurable attribute,
`ask_credential_on_launch`. This value was used at launch time to determine
which missing credential values were necessary for launch - this was primarily
used as a mechanism for users to specify an SSH (or Vault) credential to satisfy
the minimum Credential requirement.
Under the new unified Credential list model, this attribute still exists, but it
is no longer bound to a notion of "requiring" a Credential. Now when
`ask_credential_on_launch` is `True`, it signifies that users may (if they
wish) specify a list of credentials at launch time to override those defined on
the JobTemplate:
`POST /api/v2/job_templates/N/launch/ {'credentials': [A, B, C]}`
If `ask_credential_on_launch` is `False`, it signifies that custom `credentials`
provided in the payload to `POST /api/v2/job_templates/N/launch/` will be
ignored.
Under this model, the only purpose for `ask_credential_on_launch` is to signal
that API clients should prompt the user for (optional) changes at launch time.
Backwards Compatibility Concerns
--------------------------------
Requests to update `JobTemplate.credential` and `JobTemplate.vault_credential`
will no longer work. Example request format:
`PATCH /api/v2/job_templates/N/ {'credential': X, 'vault_credential': Y}`
This request will have no effect because support for using these
fields has been removed.
The relationship `extra_credentials` is deprecated but still supported for now.
Clients should favor the `credentials` relationship instead.
`GET` requests to `/api/v2/job_templates/N/` and `/api/v2/jobs/N/`
will include this via `related_fields`:
```
{
"related": {
...
"credentials": "/api/v2/job_templates/5/credentials/",
"extra_credentials": "/api/v2/job_templates/5/extra_credentials/",
}
}
```
...and `summary_fields`, which is not included in list views:
```
{
"summary_fields": {
"credentials": [
{
"description": "",
"credential_type_id": 5,
"id": 2,
"kind": "aws",
"name": "some-aws"
},
{
"description": "",
"credential_type_id": 10,
"id": 4,
"kind": "gce",
"name": "some-gce"
}
],
"extra_credentials": [
{
"description": "",
"credential_type_id": 5,
"id": 2,
"kind": "aws",
"name": "some-aws"
},
{
"description": "",
"credential_type_id": 10,
"id": 4,
"kind": "gce",
"name": "some-gce"
}
],
}
}
```
The only difference between `credentials` and `extra_credentials` is that the
latter is filtered to only show "cloud" type credentials, whereas the former
can be used to manage all types of related credentials.
The `/api/v2/job_templates/N/launch/` endpoint no longer provides
backwards compatible support for specifying credentials at launch time
via the `credential` or `vault_credential` fields.
The launch endpoint can still accept a list under the `extra_credentials` key,
but this is deprecated in favor `credentials`.
Specifying Multiple Vault Credentials
-------------------------------------
One interesting use case supported by the new "zero or more credentials" model
is the ability to assign multiple Vault credentials to a Job Template run.
This specific use case covers Ansible's support for multiple vault passwords for
a playbook run (since Ansible 2.4):
http://docs.ansible.com/ansible/latest/vault.html#vault-ids-and-multiple-vault-passwords
Vault credentials in awx now have an optional field, `vault_id`, which is
analogous to the `--vault-id` argument to `ansible-playbook`. To run
a playbook which makes use of multiple vault passwords:
1. Make a Vault credential in Tower for each vault password; specify the Vault
ID as a field on the credential and input the password (which will be
encrypted and stored).
2. Assign multiple vault credentials to the job template via the new
`credentials` endpoint:
```
POST /api/v2/job_templates/N/credentials/
{
'associate': true,
'id': X
}
```
3. Launch the job template, and `ansible-playbook` will be invoked with
multiple `--vault-id` arguments.
Prompted Vault Credentials
--------------------------
Vault credentials can have passwords that are marked as "Prompt on launch".
When this is the case, the launch endpoint of any related Job Templates will
communicate necessary Vault passwords via the `passwords_needed_to_start` key:
```
GET /api/v2/job_templates/N/launch/
{
'passwords_needed_to_start': [
'vault_password.X',
'vault_password.Y',
]
}
```
...where `X` and `Y` are primary keys of the associated Vault credentials.
```
POST /api/v2/job_templates/N/launch/
{
'credential_passwords': {
'vault_password.X': 'first-vault-password'
'vault_password.Y': 'second-vault-password'
}
}
```
Inventory Source Credentials
----------------------------
Inventory sources and inventory updates that they spawn also use the same
relationship. The new endpoints for this are
- `/api/v2/inventory_sources/N/credentials/` and
- `/api/v2/inventory_updates/N/credentials/`
Most cloud sources will continue to adhere to the constraint that they
must have a single credential that corresponds to their cloud type.
However, this relationship allows users to associate multiple vault
credentials of different ids to inventory sources.

View File

@@ -7,13 +7,13 @@ The intended audience of this document is the Ansible Tower developer.
### RBAC - System Basics
There are three main concepts to be familiar with, Roles, Resources, and Users.
There are three main concepts to be familiar with: Roles, Resources, and Users.
Users can be members of a role, which gives them certain access to any
resources associated with that role, or any resources associated with "descendent"
roles.
For example, if I have an organization named "MyCompany" and I want to allow
two people, "Alice", and "Bob", access to manage all the settings associated
two people, "Alice", and "Bob", access to manage all of the settings associated
with that organization, I'd make them both members of the organization's `admin_role`.
It is often the case that you have many Roles in a system, and you want some
@@ -21,9 +21,9 @@ roles to include all of the capabilities of other roles. For example, you may
want a System Administrator to have access to everything that an Organization
Administrator has access to, who has everything that a Project Administrator
has access to, and so on. We refer to this concept as the 'Role Hierarchy', and
is represented by allowing Roles to have "Parent Roles". Any permission that a
Role has is implicitly granted to any parent roles (or parents of those
parents, and so on). Of course Roles can have more than one parent, and
is represented by allowing roles to have "Parent Roles". Any permission that a
role has is implicitly granted to any parent roles (or parents of those
parents, and so on). Of course roles can have more than one parent, and
capabilities are implicitly granted to all parents. (Technically speaking, this
forms a directional acyclic graph instead of a strict hierarchy, but the
concept should remain intuitive.)
@@ -34,10 +34,10 @@ concept should remain intuitive.)
### Implementation Overview
The RBAC system allows you to create and layer roles for controlling access to resources. Any Django Model can
be made into a resource in the RBAC system by using the `ResourceMixin`. Once a model is accessible as a resource you can
be made into a resource in the RBAC system by using the `ResourceMixin`. Once a model is accessible as a resource, you can
extend the model definition to have specific roles using the `ImplicitRoleField`. Within the declaration of
this role field you can also specify any parents the role may have, and the RBAC system will take care of
all the appropriate ancestral binding that takes place behind the scenes to ensure that the model you've declared
all of the appropriate ancestral binding that takes place behind the scenes to ensure that the model you've declared
is kept up to date as the relations in your model change.
### Roles
@@ -52,7 +52,7 @@ what roles are checked when accessing a resource.
| -- AdminRole
|-- parent = ResourceA.AdminRole
When a user attempts to access ResourceB we will check for their access using the set of all unique roles, including the parents.
When a user attempts to access ResourceB, we will check for their access using the set of all unique roles, including the parents.
ResourceA.AdminRole, ResourceB.AdminRole
@@ -60,7 +60,7 @@ This would provide any members of the above roles with access to ResourceB.
#### Singleton Role
There is a special case _Singleton Role_ that you can create. This type of role is for system wide roles.
There is a special case _Singleton Role_ that you can create. This type of role is for system-wide roles.
### Models
@@ -72,7 +72,7 @@ The RBAC system defines a few new models. These models represent the underlying
##### `visible_roles(cls, user)`
`visible_roles` is a class method that will lookup all of the `Role` instances a user can "see". This includes any roles the user is a direct decendent of as well as any ancestor roles.
`visible_roles` is a class method that will look up all of the `Role` instances a user can "see". This includes any roles the user is a direct descendent of as well as any ancestor roles.
##### `singleton(cls, name)`
@@ -137,7 +137,7 @@ By mixing in the `ResourceMixin` to your model, you are turning your model in to
## Usage
After exploring the _Overview_ the usage of the RBAC implementation in your code should feel unobtrusive and natural.
After exploring the _Overview_, the usage of the RBAC implementation in your code should feel unobtrusive and natural.
```python
# make your model a Resource
@@ -150,7 +150,7 @@ After exploring the _Overview_ the usage of the RBAC implementation in your code
)
```
Now that your model is a resource and has a `Role` defined, you can begin to access the helper methods provided to you by the `ResourceMixin` for checking a users access to your resource. Here is the output of a Python REPL session.
Now that your model is a resource and has a `Role` defined, you can begin to access the helper methods provided to you by the `ResourceMixin` for checking a user's access to your resource. Here is the output of a Python REPL session:
```python
# we've created some documents and a user

View File

@@ -1,7 +1,7 @@
# Relaunch on Hosts with Status
This feature allows the user to relaunch a job, targeting only hosts marked
as failed in the original job.
This feature allows the user to relaunch a job, targeting only the hosts marked
as "failed" in the original job.
### Definition of "failed"
@@ -10,27 +10,27 @@ is different from "hosts with failed tasks". Unreachable hosts can have
no failed tasks. This means that the count of "failed hosts" can be different
from the failed count, given in the summary at the end of a playbook.
This definition corresponds to Ansible .retry files.
This definition corresponds to Ansible `.retry` files.
### API Design of Relaunch
#### Basic Relaunch
POST to `/api/v2/jobs/N/relaunch/` without any request data should relaunch
POSTs to `/api/v2/jobs/N/relaunch/` without any request data should relaunch
the job with the same `limit` value that the original job used, which
may be an empty string.
This is implicitly the "all" option below.
This is implicitly the "all" option, mentioned below.
#### Relaunch by Status
Providing request data containing `{"hosts": "failed"}` should change
the `limit` of the relaunched job to target failed hosts from the previous
job. Hosts will be provided as a comma-separated list in the limit. Formally,
these are options
these are options:
- all: relaunch without changing the job limit
- failed: relaunch against all hos
- failed: relaunch against all hosts
### Relaunch Endpoint
@@ -60,12 +60,12 @@ then the request will be rejected. For example, if a GET yielded:
}
```
Then a POST of `{"hosts": "failed"}` should return a descriptive response
...then a POST of `{"hosts": "failed"}` should return a descriptive response
with a 400-level status code.
# Acceptance Criteria
Scenario: user launches a job against host "foobar", and the run fails
Scenario: User launches a job against host "foobar", and the run fails
against this host. User changes name of host to "foo", and relaunches job
against failed hosts. The `limit` of the relaunched job should reference
"foo" and not "foobar".
@@ -79,9 +79,9 @@ relaunch the same way that relaunching has previously worked.
If a playbook provisions a host, this feature should behave reasonably
when relaunching against a status that includes these hosts.
Feature should work even if hosts have tricky characters in their names,
This feature should work even if hosts have tricky characters in their names,
like commas.
Also need to consider case where a task `meta: clear_host_errors` is present
inside a playbook, and that the retry subset behavior is the same as Ansible
One may also need to consider cases where a task `meta: clear_host_errors` is present
inside a playbook; the retry subset behavior is the same as Ansible's
for this case.

View File

@@ -1,20 +1,20 @@
Background Tasks in AWX
=======================
In this document, we will go into a bit of detail about how and when AWX runs Python code _in the background_ (_i.e._, _outside_ of the context of an HTTP request), such as:
In this document, we will go into a bit of detail about how and when AWX runs Python code _in the background_ (_i.e._, **outside** of the context of an HTTP request), such as:
* Any time a Job is launched in AWX (a Job Template, an Ad Hoc Command, a Project
Update, an Inventory Update, a System Job), a background process retrieves
metadata _about_ that job from the database and forks some process (_e.g._,
`ansible-playbook`, `awx-manage inventory_import`)
* Certain expensive or time-consuming tasks run in the background
* Certain expensive or time-consuming tasks running in the background
asynchronously (_e.g._, when deleting an inventory).
* AWX runs a variety of periodic background tasks on a schedule. Some examples
are:
- AWX's "Task Manager/Scheduler" wakes up periodically and looks for
`pending` jobs that have been launched and are ready to start running.
`pending` jobs that have been launched and are ready to start running
- AWX periodically runs code that looks for scheduled jobs and launches
them.
them
- AWX runs a variety of periodic tasks that clean up temporary files, and
performs various administrative checks
- Every node in an AWX cluster runs a periodic task that serves as

View File

@@ -1,11 +1,11 @@
Tower configuration gives tower users the ability to adjust multiple runtime parameters of Tower, thus take fine-grained control over Tower run.
Tower configuration gives Tower users the ability to adjust multiple runtime parameters of Tower, which enables much more fine-grained control over Tower runs.
## Usage manual
#### To use
The REST endpoint for CRUD operations against Tower configurations is `/api/<version #>/settings/`. GETing to that endpoint will return a list of available Tower configuration categories and their urls, such as `"system": "/api/<version #>/settings/system/"`. The URL given to each category is the endpoint for CRUD operations against individual settings under that category.
#### To Use:
The REST endpoint for CRUD operations against Tower configurations can be found at `/api/<version #>/settings/`. GETing to that endpoint will return a list of available Tower configuration categories and their URLs, such as `"system": "/api/<version #>/settings/system/"`. The URL given to each category is the endpoint for CRUD operations against individual settings under that category.
Here is a typical Tower configuration category GET response.
Here is a typical Tower configuration category GET response:
```
GET /api/v2/settings/github-team/
HTTP 200 OK
@@ -27,10 +27,10 @@ X-API-Time: 0.026s
}
```
The returned body is a JSON of key-value pairs, where the key is the name of Tower configuration setting, and the value is the value of that setting. To update the settings, simply update setting values and PUT/PATCH to the same endpoint.
The returned body is a JSON of key-value pairs, where the key is the name of the Tower configuration setting, and the value is the value of that setting. To update the settings, simply update setting values and PUT/PATCH to the same endpoint.
#### To develop
Each Django app in tower should have a `conf.py` file where related settings get registered. Below is the general format for `conf.py`:
#### To Develop:
Each Django app in Tower should have a `conf.py` file where related settings get registered. Below is the general format for `conf.py`:
```python
# Other dependencies
@@ -52,7 +52,7 @@ register(
# Other setting registries
```
`register` is the endpoint API for registering individual tower configurations:
`register` is the endpoint API for registering individual Tower configurations:
```
register(
setting,
@@ -66,34 +66,34 @@ register(
defined_in_file=False,
)
```
Here is the details of each argument:
Here are the details for each argument:
| Argument Name | Argument Value Type | Description |
|--------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `setting` | `str` | Name of the setting. Usually all-capital connected by underscores like `'FOO_BAR'` |
| `field_class` | a subclass of DRF serializer field available in `awx.conf.fields` | The class wrapping around value of the configuration, responsible for retrieving, setting, validating and storing configuration values. |
| `**field_related_kwargs` | **kwargs | Key-worded arguments needed to initialize an instance of `field_class`. |
| `**field_related_kwargs` | `**kwargs` | Key-worded arguments needed to initialize an instance of `field_class`. |
| `category_slug` | `str` | The actual identifier used for finding individual setting categories. |
| `category` | transformable string, like `_('foobar')` | The human-readable form of `category_slug`, mainly for display. |
| `depends_on` | `list` of `str`s | A list of setting names this setting depends on. A setting this setting depends on is another tower configuration setting whose changes may affect the value of this setting. |
| `placeholder` | transformable string, like `_('foobar')` | A human-readable string displaying a typical value for the setting, mainly used by UI |
| `encrypted` | `boolean` | Flag determining whether the setting value should be encrypted |
| `defined_in_file` | `boolean` | Flag determining whether a value has been manually set in settings file. |
| `depends_on` | `list` of `str`s | A list of setting names this setting depends on. A setting this setting depends on is another Tower configuration setting whose changes may affect the value of this setting. |
| `placeholder` | transformable string, like `_('foobar')` | A human-readable string displaying a typical value for the setting, mainly used by the UI. |
| `encrypted` | `boolean` | A flag which determines whether the setting value should be encrypted. |
| `defined_in_file` | `boolean` | A flag which determines whether a value has been manually set in the settings file. |
During Tower bootstrapping, All settings registered in `conf.py` modules of Tower Django apps will be loaded (registered). The set of Tower configuration settings will form a new top-level of `django.conf.settings` object. Later all Tower configuration settings will be available as attributes of it, just like normal Django settings. Note Tower configuration settings take higher priority over normal settings, meaning if a setting `FOOBAR` is both defined in a settings file and registered in a `conf.py`, the registered attribute will be used over the defined attribute every time.
During Tower bootstrapping, **all** settings registered in `conf.py` modules of Tower Django apps will be loaded (registered). This set of Tower configuration settings will form a new top-level of the `django.conf.settings` object. Later, all Tower configuration settings will be available as attributes of it, just like the normal Django settings. Note that Tower configuration settings take higher priority over normal settings, meaning if a setting `FOOBAR` is both defined in a settings file *and* registered in `conf.py`, the registered attribute will be used over the defined attribute every time.
Note when registering new configurations, it is desired to provide a default value if it is possible to do so, as Tower configuration UI has a 'revert all' functionality that revert all settings to it's default value.
Please note that when registering new configurations, it is recommended to provide a default value if it is possible to do so, as the Tower configuration UI has a 'revert all' functionality that reverts all settings to its default value.
Starting from 3.2, Tower configuration supports category-specific validation functions. They should also be defined under `conf.py` in the form
Starting with version 3.2, Tower configuration supports category-specific validation functions. They should also be defined under `conf.py` in the form
```python
def custom_validate(serializer, attrs):
'''
Method details
'''
```
Where argument `serializer` refers to the underlying `SettingSingletonSerializer` object, and `attrs` refers to a dictionary of input items.
...where the argument `serializer` refers to the underlying `SettingSingletonSerializer` object, and `attrs` refers to a dictionary of input items.
Then at the end of `conf.py`, register defined custom validation methods to different configuration categories (`category_slug`) using `awx.conf.register_validate`:
At the end of `conf.py`, register defined custom validation methods to different configuration categories (`category_slug`) using `awx.conf.register_validate`:
```python
# conf.py
...

View File

@@ -4,19 +4,18 @@ Our channels/websocket implementation handles the communication between Tower AP
## Architecture
Tower enlists the help of the `django-channels` library to create our communications layer. `django-channels` provides us with per-client messaging integration in to our application by implementing the Asynchronous Server Gateway Interface or ASGI.
Tower enlists the help of the `django-channels` library to create our communications layer. `django-channels` provides us with per-client messaging integration in our application by implementing the Asynchronous Server Gateway Interface (ASGI).
To communicate between our different services we use RabbitMQ to exchange messages. Traditionally, `django-channels` uses Redis, but Tower uses a custom `asgi_amqp` library that allows use to RabbitMQ for the same purpose.
To communicate between our different services we use RabbitMQ to exchange messages. Traditionally, `django-channels` uses Redis, but Tower uses a custom `asgi_amqp` library that allows access to RabbitMQ for the same purpose.
Inside Tower we use the emit_channel_notification which places messages on to the queue. The messages are given an explicit
event group and event type which we later use in our wire protocol to control message delivery to the client.
Inside Tower we use the `emit_channel_notification` function which places messages onto the queue. The messages are given an explicit event group and event type which we later use in our wire protocol to control message delivery to the client.
## Protocol
You can connect to the Tower channels implementation using any standard websocket library but pointing it to `/websocket`. You must
You can connect to the Tower channels implementation using any standard websocket library by pointing it to `/websocket`. You must
provide a valid Auth Token in the request URL.
Once you've connected, you are not subscribed to any event groups. You subscribe by sending a json request that looks like the following:
Once you've connected, you are not subscribed to any event groups. You subscribe by sending a `json` request that looks like the following:
'groups': {
'jobs': ['status_changed', 'summary'],
@@ -30,37 +29,28 @@ Once you've connected, you are not subscribed to any event groups. You subscribe
'control': ['limit_reached_<user_id>'],
}
These map to the event group and event type you are interested in. Sending in a new groups dictionary will clear all of your previously
subscribed groups before subscribing to the newly requested ones. This is intentional, and makes the single page navigation much easier since
you only need to care about current subscriptions.
These map to the event group and event type that the user is interested in. Sending in a new groups dictionary will clear all previously-subscribed groups before subscribing to the newly requested ones. This is intentional, and makes the single page navigation much easier since users only need to care about current subscriptions.
## Deployment
This section will specifically discuss deployment in the context of websockets and the path your request takes through the system.
This section will specifically discuss deployment in the context of websockets and the path those requests take through the system.
Note: The deployment of Tower changes slightly with the introduction of `django-channels` and websockets. There are some minor differences between
production and development deployments that I will point out, but the actual services that run the code and handle the requests are identical
between the two environments.
**Note:** The deployment of Tower changes slightly with the introduction of `django-channels` and websockets. There are some minor differences between production and development deployments that will be pointed out in this document, but the actual services that run the code and handle the requests are identical between the two environments.
### Services
| Name | Details |
|:-----------:|:-----------------------------------------------------------------------------------------------------------:|
| nginx | listens on ports 80/443, handles HTTPS proxying, serves static assets, routes requests for daphne and uwsgi |
| uwsgi | listens on port 8050, handles API requests |
| daphne | listens on port 8051, handles Websocket requests |
| runworker | no listening port, watches and processes the message queue |
| supervisord | (production-only) handles the process management of all the services except nginx |
| `nginx` | listens on ports 80/443, handles HTTPS proxying, serves static assets, routes requests for `daphne` and `uwsgi` |
| `uwsgi` | listens on port 8050, handles API requests |
| `daphne` | listens on port 8051, handles websocket requests |
| `runworker` | no listening port, watches and processes the message queue |
| `supervisord` | (production-only) handles the process management of all the services except `nginx` |
When a request comes in to *nginx* and have the `Upgrade` header and is for the path `/websocket`, then *nginx* knows that it should
be routing that request to our *daphne* service.
When a request comes in to `nginx` and has the `Upgrade` header and is for the path `/websocket`, then `nginx` knows that it should be routing that request to our `daphne` service.
*daphne* receives the request and generates channel and routing information for the request. The configured event handlers for *daphne*
then unpack and parse the request message using the wire protocol mentioned above. This ensures that the connect has its context limited to only
receive messages for events it is interested in. *daphne* uses internal events to trigger further behavior, which will generate messages
and send them to the queue, that queue is processed by the *runworker*.
`daphne` receives the request and generates channel and routing information for the request. The configured event handlers for `daphne` then unpack and parse the request message using the wire protocol mentioned above. This ensures that the connection has its context limited to only receive messages for events it is interested in. `daphne` uses internal events to trigger further behavior, which will generate messages and send them to the queue, which is then processed by the `runworker`.
*runworker* processes the messages from the queue. This uses the contextual information of the message provided
by the *daphne* server and our *asgi_amqp* implementation to broadcast messages out to each client.
`runworker` processes the messages from the queue. This uses the contextual information of the message provided by the `daphne` server and our `asgi_amqp` implementation to broadcast messages out to each client.
### Development
- nginx listens on 8013/8043 instead of 80/443
- `nginx` listens on 8013/8043 instead of 80/443