Merge pull request #4767 from beeankha/awx_doc_edits

[WIP] Edit AWX Docs

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
This commit is contained in:
softwarefactory-project-zuul[bot] 2019-09-23 14:30:12 +00:00 committed by GitHub
commit fcfd59ebe2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
39 changed files with 927 additions and 809 deletions

View File

@ -1,19 +1,19 @@
## Ansible Runner Integration Overview
Much of the code in AWX around ansible and ansible-playbook invocation interacting has been removed and put into the project ansible-runner. AWX now calls out to ansible-runner to invoke ansible and ansible-playbook.
Much of the code in AWX around ansible and `ansible-playbook` invocation has been removed and put into the project `ansible-runner`. AWX now calls out to `ansible-runner` to invoke ansible and `ansible-playbook`.
### Lifecycle
In AWX, a task of a certain job type is kicked off (i.e. RunJob, RunProjectUpdate, RunInventoryUpdate, etc) in tasks.py. A temp directory is build to house ansible-runner parameters (i.e. envvars, cmdline, extravars, etc.). The temp directory is filled with the various concepts in AWX (i.e. ssh keys, extra varsk, etc.). The code then builds a set of parameters to be passed to the ansible-runner python module interface, `ansible-runner.interface.run()`. This is where AWX passes control to ansible-runner. Feedback is gathered by AWX via callbacks and handlers passed in.
In AWX, a task of a certain job type is kicked off (_i.e._, RunJob, RunProjectUpdate, RunInventoryUpdate, etc.) in `tasks.py`. A temp directory is built to house `ansible-runner` parameters (_i.e._, `envvars`, `cmdline`, `extravars`, etc.). The `temp` directory is filled with the various concepts in AWX (_i.e._, `ssh` keys, `extra vars`, etc.). The code then builds a set of parameters to be passed to the `ansible-runner` Python module interface, `ansible-runner.interface.run()`. This is where AWX passes control to `ansible-runner`. Feedback is gathered by AWX via callbacks and handlers passed in.
The callbacks and handlers are:
* event_handler: Called each time a new event is created in ansible runner. AWX will disptach the event to rabbitmq to be processed on the other end by the callback receiver.
* cancel_callback: Called periodically by ansible runner. This is so that AWX can inform ansible runner if the job should be canceled or not.
* finished_callback: Called once by ansible-runner to denote that the process that was asked to run is finished. AWX will construct the special control event, `EOF`, with an associated total number of events that it observed.
* status_handler: Called by ansible-runner as the process transitions state internally. AWX uses the `starting` status to know that ansible-runner has made all of its decisions around the process that it will launch. AWX gathers and associates these decisions with the Job for historical observation.
* `event_handler`: Called each time a new event is created in `ansible-runner`. AWX will dispatch the event to `rabbitmq` to be processed on the other end by the callback receiver.
* `cancel_callback`: Called periodically by `ansible-runner`; this is so that AWX can inform `ansible-runner` if the job should be canceled or not.
* `finished_callback`: Called once by `ansible-runner` to denote that the process that was asked to run is finished. AWX will construct the special control event, `EOF`, with the associated total number of events that it observed.
* `status_handler`: Called by `ansible-runner` as the process transitions state internally. AWX uses the `starting` status to know that `ansible-runner` has made all of its decisions around the process that it will launch. AWX gathers and associates these decisions with the Job for historical observation.
### Debugging
If you want to debug ansible-runner then set `AWX_CLEANUP_PATHS=False`, run a job, observe the job's `AWX_PRIVATE_DATA_DIR` property, and go the node where the job was executed and inspect that directory.
If you want to debug `ansible-runner`, then set `AWX_CLEANUP_PATHS=False`, run a job, observe the job's `AWX_PRIVATE_DATA_DIR` property, and go the node where the job was executed and inspect that directory.
If you want to debug the process that ansible-runner invoked (i.e. ansible or ansible-playbook) then observe the job's job_env, job_cwd, and job_args parameters.
If you want to debug the process that `ansible-runner` invoked (_i.e._, Ansible or `ansible-playbook`), then observe the Job's `job_env`, `job_cwd`, and `job_args` parameters.

View File

@ -7,18 +7,18 @@ When a user wants to log into Tower, she can explicitly choose some of the suppo
* Github Team OAuth2
* Microsoft Azure Active Directory (AD) OAuth2
On the other hand, the rest of authentication methods use the same types of login info as Tower(username and password), but authenticate using external auth systems rather than Tower's own database. If some of these methods are enabled, Tower will try authenticating using the enabled methods *before Tower's own authentication method*. In specific, it follows the order
On the other hand, the other authentication methods use the same types of login info as Tower (username and password), but authenticate using external auth systems rather than Tower's own database. If some of these methods are enabled, Tower will try authenticating using the enabled methods *before Tower's own authentication method*. The order of precedence is:
* LDAP
* RADIUS
* TACACS+
* SAML
Tower will try authenticating against each enabled authentication method *in the specified order*, meaning if the same username and password is valid in multiple enabled auth methods (For example, both LDAP and TACACS+), Tower will only use the first positive match (In the above example, log a user in via LDAP and skip TACACS+).
Tower will try authenticating against each enabled authentication method *in the specified order*, meaning if the same username and password is valid in multiple enabled auth methods (*e.g.*, both LDAP and TACACS+), Tower will only use the first positive match (in the above example, log a user in via LDAP and skip TACACS+).
## Notes:
* SAML users, RADIUS users and TACACS+ users are categorized as 'Enterprise' users. The following rules apply to Enterprise users:
SAML users, RADIUS users and TACACS+ users are categorized as 'Enterprise' users. The following rules apply to Enterprise users:
* Enterprise users can only be created via the first successful login attempt from remote authentication backend.
* Enterprise users cannot be created/authenticated if non-enterprise users with the same name has already been created in Tower.
* Tower passwords of Enterprise users should always be empty and cannot be set by any user if there are enterprise backends enabled.
* If enterprise backends are disabled, an Enterprise user can be converted to a normal Tower user by setting password field. But this operation is irreversible (The converted Tower user can no longer be treated as Enterprise user)
* If enterprise backends are disabled, an Enterprise user can be converted to a normal Tower user by setting password field. But this operation is irreversible (the converted Tower user can no longer be treated as Enterprise user).

View File

@ -1,18 +1,21 @@
# LDAP
The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.
The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry-standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.
# Configure LDAP Authentication
Please see the Tower documentation as well as Ansible blog posts for basic LDAP configuration.
Please see the [Tower documentation](https://docs.ansible.com/ansible-tower/latest/html/administration/ldap_auth.html) as well as [Ansible blog post](https://www.ansible.com/blog/getting-started-ldap-authentication-in-ansible-tower) for basic LDAP configuration.
LDAP Authentication provides duplicate sets of configuration fields for authentication with up to six different LDAP servers.
The default set of configuration fields take the form `AUTH_LDAP_<field name>`. Configuration fields for additional ldap servers are numbered `AUTH_LDAP_<n>_<field name>`.
## Test environment setup
Please see README.md of this repository: https://github.com/jangsutsr/deploy_ldap.git.
The default set of configuration fields take the form `AUTH_LDAP_<field name>`. Configuration fields for additional LDAP servers are numbered `AUTH_LDAP_<n>_<field name>`.
# Basic setup for FreeIPA
## Test Environment Setup
Please see `README.md` of this repository: https://github.com/jangsutsr/deploy_ldap.git.
# Basic Setup for FreeIPA
LDAP Server URI (append if you have multiple LDAPs)
`ldaps://{{serverip1}}:636`

View File

@ -1,16 +1,16 @@
## Introduction
Starting from Tower 3.3, OAuth 2 will be used as the new means of token-based authentication. Users
will be able to manage OAuth 2 tokens as well as applications, a server-side representation of API
clients used to generate tokens. With OAuth 2, a user can authenticate by passing a token as part of
Starting from Tower 3.3, OAuth2 will be used as the new means of token-based authentication. Users
will be able to manage OAuth2 tokens as well as applications, a server-side representation of API
clients used to generate tokens. With OAuth2, a user can authenticate by passing a token as part of
the HTTP authentication header. The token can be scoped to have more restrictive permissions on top of
the base RBAC permissions of the user. Refer to [RFC 6749](https://tools.ietf.org/html/rfc6749) for
more details of OAuth 2 specification.
the base RBAC permissions of the user. Refer to [RFC 6749](https://tools.ietf.org/html/rfc6749) for
more details of OAuth2 specification.
## Basic Usage
To get started using OAuth 2 tokens for accessing the browsable API using OAuth 2, we will walkthrough acquiring a token, and using it.
To get started using OAuth2 tokens for accessing the browsable API using OAuth2, this document will walk through the steps of acquiring a token and using it.
1. Make an application with authorization_grant_type set to 'password'. HTTP POST the following to the `/api/v2/applications/` endpoint (supplying your own organization-id):
1. Make an application with `authorization_grant_type` set to 'password'. HTTP POST the following to the `/api/v2/applications/` endpoint (supplying your own `organization-id`):
```
{
"name": "Admin Internal Application",
@ -22,7 +22,7 @@ To get started using OAuth 2 tokens for accessing the browsable API using OAuth
"organization": <organization-id>
}
```
2. Make a token with a POST to the `/api/v2/tokens/` endpoint:
2. Make a token with a POST to the `/api/v2/tokens/` endpoint:
```
{
"description": "My Access Token",
@ -32,13 +32,13 @@ To get started using OAuth 2 tokens for accessing the browsable API using OAuth
```
This will return a `<token-value>` that you can use to authenticate with for future requests (this will not be shown again)
3. Use token to access a resource. We will use curl to demonstrate this:
3. Use token to access a resource. We will use `curl` to demonstrate this:
```
curl -H "Authorization: Bearer <token-value>" -X GET https://<awx>/api/v2/users/
```
> The `-k` flag may be needed if you have not set up a CA yet and are using SSL.
This token can be revoked by making a DELETE on the detail page for that token. All you need is that token's id. For example:
This token can be revoked by making a DELETE on the detail page for that token. All you need is that token's id. For example:
```
curl -ku <user>:<password> -X DELETE https://<awx>/api/v2/tokens/<pk>/
```
@ -48,19 +48,21 @@ Similarly, using a token:
curl -H "Authorization: Bearer <token-value>" -X DELETE https://<awx>/api/v2/tokens/<pk>/ -k
```
## More Information
#### Managing OAuth 2 applications and tokens
Applications and tokens can be managed as a top-level resource at `/api/<version>/applications` and
`/api/<version>/tokens`. These resources can also be accessed respective to the user at
`/api/<version>/users/N/<resource>`. Applications can be created by making a POST to either `api/<version>/applications`
or `/api/<version>/users/N/applications`.
#### Managing OAuth2 Applications and Tokens
Each OAuth 2 application represents a specific API client on the server side. For an API client to use the API via an application token,
Applications and tokens can be managed as a top-level resource at `/api/v2/applications` and
`/api/v2/tokens`. These resources can also be accessed respective to the user at
`/api/v2/users/N/<resource>`. Applications can be created by making a POST to either `api/v2/applications`
or `/api/v2/users/N/applications`.
Each OAuth2 application represents a specific API client on the server side. For an API client to use the API via an application token,
it must first have an application and issue an access token.
Individual applications will be accessible via their primary keys:
`/api/<version>/applications/<pk>/`. Here is a typical application:
`/api/v2/applications/<pk>/`. Here is a typical application:
```
{
"id": 1,
@ -103,31 +105,29 @@ Individual applications will be accessible via their primary keys:
```
In the above example, `user` is the primary key of the user associated to this application and `name` is
a human-readable identifier for the application. The other fields, like `client_id` and
`redirect_uris`, are mainly used for OAuth 2 authorization, which will be covered later in the 'Using
OAuth 2 Token System' section.
`redirect_uris`, are mainly used for OAuth2 authorization, which will be covered later in the 'Using
OAuth2 Token System' section.
Fields `client_id` and `client_secret` are immutable identifiers of applications, and will be
generated during creation; Fields `user` and `authorization_grant_type`, on the other hand, are
*immutable on update*, meaning they are required fields on creation, but will become read-only after
that.
On RBAC side:
- system admins will be able to see and manipulate all applications in the system;
**On RBAC side:**
- System admins will be able to see and manipulate all applications in the system;
- Organization admins will be able to see and manipulate all applications belonging to Organization
members;
- Other normal users will only be able to see, update and delete their own applications, but
cannot create any new applications.
Tokens, on the other hand, are resources used to actually authenticate incoming requests and mask the
permissions of the underlying user. Tokens can be created by POSTing to `/api/v2/tokens/`
endpoint by providing `application` and `scope` fields to point to related application and specify
token scope; or POSTing to `/api/v2/applications/<pk>/tokens/` by providing only `scope`, while
the parent application will be automatically linked.
Individual tokens will be accessible via their primary keys:
`/api/<version>/tokens/<pk>/`. Here is a typical token:
Individual tokens will be accessible via their primary keys at
`/api/v2/tokens/<pk>/`. Here is a typical token:
```
{
"id": 4,
@ -162,26 +162,27 @@ Individual tokens will be accessible via their primary keys:
"scope": "read"
},
```
For an OAuth 2 token, the only fully mutable fields are `scope` and `description`. The `application`
field is *immutable on update*, and all other fields are totally immutable, and will be auto-populated
during creation
* `user` field corresponds to the user the token is created for
For an OAuth2 token, the only fully mutable fields are `scope` and `description`. The `application`
field is *immutable on update*, and all other fields are totally immutable, and will be auto-populated
during creation.
* `user` - this field corresponds to the user the token is created for
* `expires` will be generated according to Tower configuration setting `OAUTH2_PROVIDER`
* `token` and `refresh_token` will be auto-generated to be non-clashing random strings.
Both application tokens and personal access tokens will be shown at the `/api/v2/tokens/`
Both application tokens and personal access tokens will be shown at the `/api/v2/tokens/`
endpoint. Personal access tokens can be identified by the `application` field being `null`.
On RBAC side:
**On RBAC side:**
- A user will be able to create a token if they are able to see the related application;
- System admin is able to see and manipulate every token in the system;
- The System Administrator is able to see and manipulate every token in the system;
- Organization admins will be able to see and manipulate all tokens belonging to Organization
members;
System Auditors can see all tokens and applications
- Other normal users will only be able to see and manipulate their own tokens.
> Note: Users can only see the token or refresh-token _value_ at the time of creation ONLY.
#### Using OAuth 2 Token System for Personal Access Tokens (PAT)
The most common usage of OAuth 2 is authenticating users. The `token` field of a token is used
#### Using OAuth2 Token System for Personal Access Tokens (PAT)
The most common usage of OAuth2 is authenticating users. The `token` field of a token is used
as part of the HTTP authentication header, in the format `Authorization: Bearer <token field value>`. This _Bearer_
token can be obtained by doing a curl to the `/api/o/token/` endpoint. For example:
```
@ -194,27 +195,29 @@ Here is an example of using that PAT to access an API endpoint using `curl`:
curl -H "Authorization: Bearer kqHqxfpHGRRBXLNCOXxT5Zt3tpJogn" http://<awx>/api/v2/credentials/
```
According to OAuth 2 specification, users should be able to acquire, revoke and refresh an access
According to OAuth2 specification, users should be able to acquire, revoke and refresh an access
token. In AWX the equivalent, and easiest, way of doing that is creating a token, deleting
a token, and deleting a token quickly followed by creating a new one.
a token, and deleting a token quickly followed by creating a new one.
The specification also provides standard ways of doing this. RFC 6749 elaborates
on those topics, but in summary, an OAuth 2 token is officially acquired via authorization using
on those topics, but in summary, an OAuth2 token is officially acquired via authorization using
authorization information provided by applications (special application fields mentioned above).
There are dedicated endpoints for authorization and acquiring tokens. The `token` endpoint
is also responsible for token refresh, and token revoke can be done by the dedicated token revoke endpoint.
In AWX, our OAuth 2 system is built on top of
In AWX, our OAuth2 system is built on top of
[Django Oauth Toolkit](https://django-oauth-toolkit.readthedocs.io/en/latest/), which provides full
support on standard authorization, token revoke and refresh. AWX implements them and puts related
endpoints under `/api/o/` endpoint. Detailed examples on the most typical usage of those endpoints
are available as description text of `/api/o/`. See below for information on Application Access Token usage.
> Note: The `/api/o/` endpoints can only be used for application tokens, and are not valid for personal access tokens.
#### Token scope mask over RBAC system
The scope of an OAuth 2 token is a space-separated string composed of keywords like 'read' and 'write'.
#### Token Scope Mask Over RBAC System
The scope of an OAuth2 token is a space-separated string composed of keywords like 'read' and 'write'.
These keywords are configurable and used to specify permission level of the authenticated API client.
For the initial OAuth 2 implementation, we use the most simple scope configuration, where the only
For the initial OAuth2 implementation, we use the most simple scope configuration, where the only
valid scope keywords are 'read' and 'write'.
Read and write scopes provide a mask layer over the RBAC permission system of AWX. In specific, a
@ -223,26 +226,27 @@ scope gives the authenticated user only read permissions the RBAC system provide
For example, if a user has admin permission to a job template, he/she can both see and modify, launch
and delete the job template if authenticated via session or basic auth. On the other hand, if the user
is authenticated using OAuth 2 token, and the related token scope is 'read', the user can only see but
is authenticated using OAuth2 token, and the related token scope is 'read', the user can only see but
not manipulate or launch the job template, despite being an admin. If the token scope is
'write' or 'read write', she can take full advantage of the job template as its admin. Note, that 'write'
'write' or 'read write', she can take full advantage of the job template as its admin. Note that 'write'
implies 'read' as well.
## Application Functions
This page lists OAuth 2 utility endpoints used for authorization, token refresh and revoke.
This page lists OAuth2 utility endpoints used for authorization, token refresh and revoke.
Note endpoints other than `/api/o/authorize/` are not meant to be used in browsers and do not
support HTTP GET. The endpoints here strictly follow
[RFC specs for OAuth 2](https://tools.ietf.org/html/rfc6749), so please use that for detailed
reference. Here we give some examples to demonstrate the typical usage of these endpoints in
AWX context (Note AWX net location default to `http://localhost:8013` in examples):
[RFC specs for OAuth2](https://tools.ietf.org/html/rfc6749), so please use that for detailed
reference. Below are some examples to demonstrate the typical usage of these endpoints in
AWX context (note that the AWX net location defaults to `http://localhost:8013` in these examples).
#### Application using `authorization code` grant type
#### Application Using `authorization code` Grant Type
This application grant type is intended to be used when the application is executing on the server. To create
an application named `AuthCodeApp` with the `authorization-code` grant type,
Make a POST to the `/api/v2/applications/` endpoint.
an application named `AuthCodeApp` with the `authorization-code` grant type,
make a POST to the `/api/v2/applications/` endpoint:
```text
{
"name": "AuthCodeApp",
@ -253,21 +257,22 @@ Make a POST to the `/api/v2/applications/` endpoint.
"skip_authorization": false
}
```
You can test the authorization flow out with this new application by copying the client_id and URI link into the
homepage [here](http://django-oauth-toolkit.herokuapp.com/consumer/) and click submit. This is just a simple test
application Django-oauth-toolkit provides.
You can test the authorization flow out with this new application by copying the `client_id` and URI link into the
homepage [here](http://django-oauth-toolkit.herokuapp.com/consumer/) and click submit. This is just a simple test
application `Django-oauth-toolkit` provides.
From the client app, the user makes a GET to the Authorize endpoint with the `response_type`,
From the client app, the user makes a GET to the Authorize endpoint with the `response_type`,
`client_id`, `redirect_uris`, and `scope`. AWX will respond with the authorization `code` and `state`
to the redirect_uri specified in the application. The client application will then make a POST to the
`api/o/token/` endpoint on AWX with the `code`, `client_id`, `client_secret`, `grant_type`, and `redirect_uri`.
to the `redirect_uri` specified in the application. The client application will then make a POST to the
`api/o/token/` endpoint on AWX with the `code`, `client_id`, `client_secret`, `grant_type`, and `redirect_uri`.
AWX will respond with the `access_token`, `token_type`, `refresh_token`, and `expires_in`. For more
information on testing this flow, refer to [django-oauth-toolkit](http://django-oauth-toolkit.readthedocs.io/en/latest/tutorial/tutorial_01.html#test-your-authorization-server).
#### Application using `password` grant type
#### Application Using `password` Grant Type
This is also called the `resource owner credentials grant`. This is for use by users who have
native access to the web app. This should be used when the client is the Resource owner. Suppose
native access to the web app. This should be used when the client is the Resource owner. Suppose
we have an application `Default Application` with grant type `password`:
```text
{
@ -285,7 +290,7 @@ we have an application `Default Application` with grant type `password`:
}
```
Log in is not required for `password` grant type, so we can simply use `curl` to acquire a personal access token
Login is not required for `password` grant type, so we can simply use `curl` to acquire a personal access token
via `/api/o/token/`:
```bash
curl -X POST \
@ -294,12 +299,12 @@ curl -X POST \
IaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
http://<awx>/api/o/token/ -i
```
In the above post request, parameters `username` and `password` are username and password of the related
In the above POST request, parameters `username` and `password` are the username and password of the related
AWX user of the underlying application, and the authentication information is of format
`<client_id>:<client_secret>`, where `client_id` and `client_secret` are the corresponding fields of
underlying application.
Upon success, access token, refresh token and other information are given in the response body in JSON
Upon success, the access token, refresh token and other information are given in the response body in JSON
format:
```text
HTTP/1.1 200 OK
@ -317,9 +322,11 @@ Strict-Transport-Security: max-age=15768000
{"access_token": "9epHOqHhnXUcgYK8QanOmUQPSgX92g", "token_type": "Bearer", "expires_in": 315360000000, "refresh_token": "jMRX6QvzOTf046KHee3TU5mT3nyXsz", "scope": "read"}
```
## Token Functions
#### Refresh an existing access token
#### Refresh an Existing Access Token
Suppose we have an existing access token with refresh token provided:
```text
{
@ -334,14 +341,14 @@ Suppose we have an existing access token with refresh token provided:
"scope": "read write"
}
```
The `/api/o/token/` endpoint is used for refreshing access token:
The `/api/o/token/` endpoint is used for refreshing the access token:
```bash
curl -X POST \
-d "grant_type=refresh_token&refresh_token=AL0NK9TTpv0qp54dGbC4VUZtsZ9r8z" \
-u "gwSPoasWSdNkMDtBN3Hu2WYQpPWCO9SwUEsKK22l:fI6ZpfocHYBGfm1tP92r0yIgCyfRdDQt0Tos9L8a4fNsJjQQMwp9569eIaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
http://<awx>/api/o/token/ -i
```
In the above post request, `refresh_token` is provided by `refresh_token` field of the access token
In the above POST request, `refresh_token` is provided by `refresh_token` field of the access token
above. The authentication information is of format `<client_id>:<client_secret>`, where `client_id`
and `client_secret` are the corresponding fields of underlying related application of the access token.
@ -364,12 +371,14 @@ Strict-Transport-Security: max-age=15768000
```
Internally, the refresh operation deletes the existing token and a new token is created immediately
after, with information like scope and related application identical to the original one. We can
verify by checking the new token is present and the old token is deleted at the /api/v2/tokens/ endpoint.
verify by checking the new token is present and the old token is deleted at the `/api/v2/tokens/` endpoint.
#### Revoke an access token
##### Alternatively Revoke using the /api/o/revoke-token/ endpoint
Revoking an access token by this method is the same as deleting the token resource object, but it allows you to delete a token by providing its token value, and the associated `client_id` (and `client_secret` if the application is `confidential`). For example:
#### Revoke an Access Token
##### Alternatively Revoke Using the /api/o/revoke-token/ Endpoint
Revoking an access token by this method is the same as deleting the token resource object, but it allows you to delete a token by providing its token value, and the associated `client_id` (and `client_secret` if the application is `confidential`). For example:
```bash
curl -X POST -d "token=rQONsve372fQwuc2pn76k3IHDCYpi7" \
-u "gwSPoasWSdNkMDtBN3Hu2WYQpPWCO9SwUEsKK22l:fI6ZpfocHYBGfm1tP92r0yIgCyfRdDQt0Tos9L8a4fNsJjQQMwp9569eIaUBsaVDgt2eiwOGe0bg5m5vCSstClZmtdy359RVx2rQK5YlIWyPlrolpt2LEpVeKXWaiybo" \
@ -377,23 +386,18 @@ curl -X POST -d "token=rQONsve372fQwuc2pn76k3IHDCYpi7" \
```
`200 OK` means a successful delete.
We can verify the effect by checking if the token is no longer present
at /api/v2/tokens/.
We can verify the effect by checking if the token is no longer present
at `/api/v2/tokens/`.
## Acceptance Criteria
* All CRUD operations for OAuth 2 applications and tokens should function as described.
* RBAC rules applied to OAuth 2 applications and tokens should behave as described.
* All CRUD operations for OAuth2 applications and tokens should function as described.
* RBAC rules applied to OAuth2 applications and tokens should behave as described.
* A default application should be auto-created for each new user.
* Incoming requests using unexpired OAuth 2 token correctly in authentication header should be able
* Incoming requests using unexpired OAuth2 token correctly in authentication header should be able
to successfully authenticate themselves.
* Token scope mask over RBAC should work as described.
* Tower configuration setting `OAUTH2_PROVIDER` should be configurable and function as described.
* `/api/o/` endpoint should work as expected. In specific, all examples given in the description
help text should be working (user following the steps should get expected result).
help text should be working (a user following the steps should get expected result).

View File

@ -1,20 +1,23 @@
# SAML
Security Assertion Markup Language, or SAML, is an open standard for exchanging authentication and/or authorization data between an identity provider (i.e. LDAP) and a service provider (i.e. AWX). More concretely, AWX can be configured to talk with SAML in order to authenticate (create/login/logout) users of AWX. User Team and Organization membership can be embedded in the SAML response to AWX.
Security Assertion Markup Language, or SAML, is an open standard for exchanging authentication and/or authorization data between an identity provider (*i.e.*, LDAP) and a service provider (*i.e.*, AWX). More concretely, AWX can be configured to talk with SAML in order to authenticate (create/login/logout) users of AWX. User Team and Organization membership can be embedded in the SAML response to AWX.
# Configure SAML Authentication
Please see the Tower documentation as well as Ansible blog posts for basic SAML configuration. Note that AWX's SAML implementation relies on python-social-auth which uses python-saml. AWX exposes 3 fields that are directly passed to the lower libraries:
Please see the [Tower documentation](https://docs.ansible.com/ansible-tower/latest/html/administration/ent_auth.html#saml-authentication-settings) as well as the [Ansible blog post](https://www.ansible.com/blog/using-saml-with-red-hat-ansible-tower) for basic SAML configuration. Note that AWX's SAML implementation relies on `python-social-auth` which uses `python-saml`. AWX exposes three fields which are directly passed to the lower libraries:
* `SOCIAL_AUTH_SAML_SP_EXTRA` is passed to the `python-saml` library configuration's `sp` setting.
* `SOCIAL_AUTH_SAML_SECURITY_CONFIG` is passed to the `python-saml` library configuration's `security` setting.
* `SOCIAL_AUTH_SAML_EXTRA_DATA`
See http://python-social-auth-docs.readthedocs.io/en/latest/backends/saml.html#advanced-settings for more information.
# Configure SAML for Team and Organization Membership
AWX can be configured to look for particular attributes that contain AWX Team and Organization membership to associate with users when they login to AWX. The attribute names are defined in AWX settings. Specifically, the authentication settings tab and SAML sub category fields *SAML Team Attribute Mapping* and *SAML Organization Attribute Mapping*. The meaning and usefulness of these settings is best motivated through example.
**Example SAML Organization Attribute Mapping**
# Configure SAML for Team and Organization Membership
AWX can be configured to look for particular attributes that contain AWX Team and Organization membership to associate with users when they log in to AWX. The attribute names are defined in AWX settings. Specifically, the authentication settings tab and SAML sub category fields *SAML Team Attribute Mapping* and *SAML Organization Attribute Mapping*. The meaning and usefulness of these settings is best communicated through example.
### Example SAML Organization Attribute Mapping
Below is an example SAML attribute that embeds user organization membership in the attribute *member-of*.
```
<saml2:AttributeStatement>
<saml2:AttributeStatement>
<saml2:Attribute FriendlyName="member-of" Name="member-of" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:unspecified">
<saml2:AttributeValue>Engineering</saml2:AttributeValue>
<saml2:AttributeValue>IT</saml2:AttributeValue>
@ -25,9 +28,9 @@ Below is an example SAML attribute that embeds user organization membership in t
<saml2:AttributeValue>IT</saml2:AttributeValue>
<saml2:AttributeValue>HR</saml2:AttributeValue>
</saml2:Attribute>
</saml2:AttributeStatement>
</saml2:AttributeStatement>
```
Below, the corresponding AWX configuration.
Below, the corresponding AWX configuration:
```
{
"saml_attr": "member-of",
@ -36,16 +39,16 @@ Below, the corresponding AWX configuration.
'remove_admins': true
}
```
**saml_attr:** The saml attribute name where the organization array can be found.
**saml_attr:** The SAML attribute name where the organization array can be found.
**remove:** True to remove user from all organizations before adding the user to the list of Organizations. False to keep the user in whatever Organization(s) they are in while adding the user to the Organization(s) in the SAML attribute.
**remove:** Set this to `true` to remove a user from all organizations before adding the user to the list of Organizations. Set it to `false` to keep the user in whatever Organization(s) they are in while adding the user to the Organization(s) in the SAML attribute.
**saml_admin_attr:** The saml attribute name where the organization administrators array can be found.
**saml_admin_attr:** The SAML attribute name where the organization administrators' array can be found.
**remove_admins:** True to remove user from all organizations that it is admin before adding the user to the list of Organizations admins. False to keep the user in whatever Organization(s) they are in as admin while adding the user as an Organization administrator in the SAML attribute.
**remove_admins:** Set this to `true` to remove a user from all organizations that they are administrators of before adding the user to the list of Organizations admins. Set it to `false` to keep the user in whatever Organization(s) they are in as admin while adding the user as an Organization administrator in the SAML attribute.
**Example SAML Team Attribute Mapping**
Below is another example of a SAML attribute that contains a Team membership in a list.
### Example SAML Team Attribute Mapping
Below is another example of a SAML attribute that contains a Team membership in a list:
```
<saml:AttributeStatement>
<saml:Attribute
@ -78,8 +81,8 @@ Below is another example of a SAML attribute that contains a Team membership in
]
}
```
**saml_attr:** The saml attribute name where the team array can be found.
**saml_attr:** The SAML attribute name where the team array can be found.
**remove:** True to remove user from all Teams before adding the user to the list of Teams. False to keep the user in whatever Team(s) they are in while adding the user to the Team(s) in the SAML attribute.
**remove:** Set this to `true` to remove user from all Teams before adding the user to the list of Teams. Set this to `false` to keep the user in whatever Team(s) they are in while adding the user to the Team(s) in the SAML attribute.
**team_org_map:** An array of dictionaries of the form `{ "team": "<AWX Team Name>", "organization": "<AWX Org Name>" }` that defines mapping from AWX Team -> AWX Organization. This is needed because the same named Team can exist in multiple Organizations in Tower. The organization to which a team listed in a SAML attribute belongs to would be ambiguous without this mapping.
**team_org_map:** An array of dictionaries of the form `{ "team": "<AWX Team Name>", "organization": "<AWX Org Name>" }` which defines mapping from AWX Team -> AWX Organization. This is needed because the same named Team can exist in multiple Organizations in Tower. The organization to which a team listed in a SAML attribute belongs to would be ambiguous without this mapping.

View File

@ -1,43 +1,42 @@
## Introduction
Before Tower 3.3, auth token was used as the main authentication method. Starting from Tower 3.3,
session-based authentication will take the place as the main authentication method, and auth token
Before Tower 3.3, an auth token was used as the main authentication method. Starting from Tower 3.3,
session-based authentication will take its place as the main authentication method, and auth token
will be replaced by OAuth 2 tokens.
Session authentication is a safer way of utilizing HTTP(S) cookies:
Theoretically, the user can provide authentication information, like username and password, as part of the
Session authentication is a safer way of utilizing HTTP(S) cookies. Theoretically, the user can provide authentication information, like username and password, as part of the
`Cookie` header, but this method is vulnerable to cookie hijacks, where crackers can see and steal user
information from cookie payload.
information from the cookie payload.
Session authentication, on the other hand, sets a single `session_id` cookie. The session_id
Session authentication, on the other hand, sets a single `session_id` cookie. The `session_id`
is *a random string which will be mapped to user authentication informations by server*. Crackers who
hijack cookies will only get the session_id itself, which does not imply any critical user info, is valid only for
hijack cookies will only get the `session_id` itself, which does not imply any critical user info, is valid only for
a limited time, and can be revoked at any time.
> Note: The CSRF token will by default allow HTTP. To increase security, the `CSRF_COOKIE_SECURE` setting should
> Note: The CSRF token will by default allow HTTP. To increase security, the `CSRF_COOKIE_SECURE` setting should
be set to False.
## Usage
In session authentication, users log in using the `/api/login/` endpoint. A GET to `/api/login/` displays the
log in page of API browser:
login page of API browser:
![Example session log in page](../img/auth_session_1.png?raw=true)
Users should enter correct username and password before clicking on 'LOG IN' button, which fires a POST
Users should enter correct username and password before clicking on the 'LOG IN' button, which fires a POST
to `/api/login/` to actually log the user in. The return code of a successful login is 302, meaning upon
successful login, the browser will be redirected, the redirected destination is determined by `next` form
successful login, the browser will be redirected; the redirected destination is determined by the `next` form
item described below.
It should be noted that POST body of `/api/login/` is *not* in JSON, but HTTP form format. 4 items should
It should be noted that the POST body of `/api/login/` is *not* in JSON, but in HTTP form format. Four items should
be provided in the form:
* `username`: The username of the user trying to log in.
* `password`: The password of the user trying to log in.
* `next`: The path of the redirect destination, in API browser `"/api/"` is used.
* `csrfmiddlewaretoken`: The CSRF token, usually populated by using Django template `{% csrf_token %}`.
The session_id is provided as a return `Set-Cookie` header. Here is a typical one:
The `session_id` is provided as a return `Set-Cookie` header. Here is a typical one:
```
Set-Cookie: sessionid=lwan8l5ynhrqvps280rg5upp7n3yp6ds; expires=Tue, 21-Nov-2017 16:33:13 GMT; httponly; Max-Age=1209600; Path=/
```
@ -47,40 +46,42 @@ session cookie value, expiration date, duration, etc.
The duration of the cookie is configurable by Tower Configuration setting `SESSION_COOKIE_AGE` under
category `authentication`. It is an integer denoting the number of seconds the session cookie should
live. The default session cookie age is 2 weeks.
live. The default session cookie age is two weeks.
After a valid session is acquired, a client should provide the session_id as a cookie for subsequent requests
After a valid session is acquired, a client should provide the `session_id` as a cookie for subsequent requests
in order to be authenticated. For example:
```
Cookie: sessionid=lwan8l5ynhrqvps280rg5upp7n3yp6ds; ...
```
User should use `/api/logout/` endpoint to log out. In API browser, a logged in user can do that by
simply clicking logout button on the nav bar. Under the hood the click issues a GET to '/api/logout/',
Upon success, server will invalidate current session and the response header will indicate client
to delete the session cookie. User should no longer try using this invalid session.
User should use the `/api/logout/` endpoint to log out. In the API browser, a logged-in user can do that by
simply clicking logout button on the nav bar. Under the hood, the click issues a GET to `/api/logout/`.
Upon success, the server will invalidate the current session and the response header will indicate for the client
to delete the session cookie. The user should no longer try using this invalid session.
The duration of a session is constant. However, user can extend the expiration date of a valid session
The duration of a session is constant. However, a user can extend the expiration date of a valid session
by performing session acquire with the session provided.
A Tower configuration setting, `SESSIONS_PER_USER` under category `authentication`, is used to set the
maximum number of valid sessions a user can have at the same time. For example, if `SESSIONS_PER_USER`
is set to 3 and the same user is logged in from 5 different places, the earliest 2 sessions created will be invalidated. Tower will try
is set to three and the same user is logged in from five different places, the earliest two sessions created will be invalidated. Tower will try
broadcasting, via websocket, to all available clients. The websocket message body will contain a list of
invalidated sessions. If a client finds its session in that list, it should try logging out.
Unlike tokens, sessions are meant to be short-lived and UI-only, therefore whenever a user's password
Unlike tokens, sessions are meant to be short-lived and UI-only; therefore, whenever a user's password
is updated, all sessions she owned will be invalidated and deleted.
## Acceptance Criteria
* User should be able to log in via `/api/login/` endpoint by correctly providing all necessary fields.
* Logged in users should be able to authenticate themselves by providing correct session auth info.
* Logged in users should be able to log out via `/api/logout/`.
* Users should be able to log in via the `/api/login/` endpoint by correctly providing all necessary fields.
* Logged-in users should be able to authenticate themselves by providing correct session auth info.
* Logged-in users should be able to log out via `/api/logout/`.
* The duration of a session cookie should be configurable by `SESSION_COOKIE_AGE`.
* The maximum number of concurrent login for one user should be configurable by `SESSIONS_PER_USER`,
and over-limit user sessions should be warned by websocket.
* When a user's password is changed, all her sessions should be invalidated and deleted.
* User should not be able to authenticate by HTTPS(S) request nor websocket connect using invalid
* User should not be able to authenticate by HTTPS(S) request nor websocket connection using invalid
sessions.
* No existing behavior, like job run, inventory update or callback receiver, should be affected
* No existing behavior, like job runs, inventory updates or callback receiver, should be affected
by session auth.

View File

@ -1,7 +1,7 @@
# TACACS+
[Terminal Access Controller Access-Control System Plus (TACACS+)](https://en.wikipedia.org/wiki/TACACS) is a protocol developed by Cisco to handle remote authentication and related services for networked access control through a centralized server. In specific, TACACS+ provides authentication, authorization and accounting (AAA) services. Ansible Tower currently utilizes its authentication service.
TACACS+ is configured by Tower configuration and is available under `/api/<version #>/settings/tacacsplus/`. Here is a typical configuration with every configurable field included:
TACACS+ is configured by Tower configuration and is available under `/api/v2/settings/tacacsplus/`. Here is a typical configuration with every configurable field included:
```
{
"TACACSPLUS_HOST": "127.0.0.1",
@ -11,7 +11,7 @@ TACACS+ is configured by Tower configuration and is available under `/api/<versi
"TACACSPLUS_AUTH_PROTOCOL": "ascii"
}
```
Below explains each field:
Each field is explained below:
| Field Name | Field Value Type | Field Value Default | Description |
|------------------------------|---------------------|---------------------|--------------------------------------------------------------------|
@ -19,28 +19,31 @@ Below explains each field:
| `TACACSPLUS_PORT` | Integer | 49 | Port number of TACACS+ server. |
| `TACACSPLUS_SECRET` | String | '' (empty string) | Shared secret for authenticating to TACACS+ server. |
| `TACACSPLUS_SESSION_TIMEOUT` | Integer | 5 | TACACS+ session timeout value in seconds. |
| `TACACSPLUS_AUTH_PROTOCOL` | String with choices | 'ascii' | The authentication protocol used by TACACS+ client. Choices are `ascii` and `pap` |
| `TACACSPLUS_AUTH_PROTOCOL` | String with choices | 'ascii' | The authentication protocol used by TACACS+ client (choices are `ascii` and `pap`). |
Under the hood, Tower uses [open-source TACACS+ python client](https://github.com/ansible/tacacs_plus) to communicate with the remote TACACS+ server. During authentication, Tower passes username and password to TACACS+ client, which packs up auth information and send to TACACS+ server. Based on what the server returns, Tower will invalidate login attempt if authentication fails. If authentication passes, Tower will create a user if she does not exist in database, and log the user in.
Under the hood, Tower uses [open-source TACACS+ python client](https://github.com/ansible/tacacs_plus) to communicate with the remote TACACS+ server. During authentication, Tower passes username and password to TACACS+ client, which packs up auth information and sends it to the TACACS+ server. Based on what the server returns, Tower will invalidate login attempt if authentication fails. If authentication passes, Tower will create a user if she does not exist in database, and log the user in.
## Test environment setup
## Test Environment Setup
The suggested TACACS+ server for testing is [shrubbery TACACS+ daemon](http://www.shrubbery.net/tac_plus/). It is supposed to run on a centos machine. A verified candidate is centos 6.3 AMI in AWS EC2 Community AMIs (search for 'Centos 6.3 x86_64 HVM - Minimal with cloud-init aws-cfn-bootstrap and ec2-api-tools'). Note it is required to keep TCP port 49 open, since it's the default port used by TACACS+ daemon.
The suggested TACACS+ server for testing is [shrubbery TACACS+ daemon](http://www.shrubbery.net/tac_plus/). It is supposed to run on a CentOS machine. A verified candidate is CentOS 6.3 AMI in AWS EC2 Community AMIs (search for `CentOS 6.3 x86_64 HVM - Minimal with cloud-init aws-cfn-bootstrap and ec2-api-tools`). Note that it is required to keep TCP port 49 open, since it's the default port used by the TACACS+ daemon.
We provide [a playbook](https://github.com/jangsutsr/ansible-role-tacacs) to install a working TACACS+ server. Here is a typical test setup using the provided playbook.
1. In AWS EC2, spawn the centos 6 machine.
We provide [a playbook](https://github.com/jangsutsr/ansible-role-tacacs) to install a working TACACS+ server. Here is a typical test setup using the provided playbook:
1. In AWS EC2, spawn the CentOS 6 machine.
2. In Tower, create a test project using the stand-alone playbook inventory.
3. In Tower, create a test inventory with the only host to be the spawned centos machine.
4. In Tower, create and run a job template using the created project and inventory with parameters setup as below.
3. In Tower, create a test inventory with the only host to be the spawned CentOS machine.
4. In Tower, create and run a job template using the created project and inventory with parameters setup as below:
![Example tacacs+ setup jt parameters](../img/auth_tacacsplus_1.png?raw=true)
The playbook creates a user named 'tower' with ascii password default to 'login' and modifiable by extra_var `ascii_password` and pap password default to 'papme' and modifiable by extra_var `pap_password`. In order to configure TACACS+ server to meet custom test needs, we need to modify server-side file `/etc/tac_plus.conf` and `sudo service tac_plus restart` to restart the daemon. Details on how to modify config file can be found [here](http://manpages.ubuntu.com/manpages/xenial/man5/tac_plus.conf.5.html).
The playbook creates a user named 'tower' with ascii password default to 'login' and modifiable by `extra_var` `ascii_password` and pap password default to 'papme' and modifiable by `extra_var` `pap_password`. In order to configure TACACS+ server to meet custom test needs, we need to modify server-side file `/etc/tac_plus.conf` and `sudo service tac_plus restart` to restart the daemon. Details on how to modify config file can be found [here](http://manpages.ubuntu.com/manpages/xenial/man5/tac_plus.conf.5.html).
## Acceptance Criteria
## Acceptance criteria
* All specified Tower configuration fields should be shown and configurable as documented.
* User defined by TACACS+ server should be able to log in Tower.
* User not defined by TACACS+ server should not be able to log in Tower via TACACS+.
* A user existing in TACACS+ server but not in Tower should be created after the first success log in.
* TACACS+ backend should stop authentication attempt after configured timeout and should not block the authentication pipeline in any case.
* A user defined by the TACACS+ server should be able to log into Tower.
* User not defined by TACACS+ server should not be able to log into Tower via TACACS+.
* A user existing in TACACS+ server but not in Tower should be created after the first successful log in.
* TACACS+ backend should stop an authentication attempt after configured timeout and should not block the authentication pipeline in any case.
* If exceptions occur on TACACS+ server side, the exception details should be logged in Tower, and Tower should not authenticate that user via TACACS+.

View File

@ -1,7 +1,7 @@
## Ansible Tower Capacity Determination and Job Impact
The Ansible Tower capacity system determines how many jobs can run on an Instance given the amount of resources
available to the Instance and the size of the jobs that are running (referred herafter as `Impact`).
available to the Instance and the size of the jobs that are running (referred to hereafter as `Impact`).
The algorithm used to determine this is based entirely on two things:
* How much memory is available to the system (`mem_capacity`)
@ -11,72 +11,74 @@ Capacity also impacts Instance Groups. Since Groups are composed of Instances, l
assigned to multiple Groups. This means that impact to one Instance can potentially affect the overall capacity of
other Groups.
Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see clustering.md).
When the Task Manager is preparing its graph to determine which Group a Job will run on it will commit the capacity of
an Instance Group to a job that hasn't or isn't ready to start yet. (see task_manager_system.md)
Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see [Tower Clustering/HA Overview](https://github.com/ansible/awx/blob/devel/docs/clustering.md)).
When the Task Manager is preparing its graph to determine which Group a Job will run on, it will commit the capacity of
an Instance Group to a Job that hasn't or isn't ready to start yet (see [Task Manager Overview](https://github.com/ansible/awx/blob/devel/docs/task_manager_system.md)).
Finally, if only one Instance is available, in smaller configurations, for a Job to run the Task Manager will allow that
Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that Jobs
themselves won't get clogged as a result of an under provisioned system.
Finally, if only one Instance is available (especially in smaller configurations) for a Job to run, the Task Manager will allow that
Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that jobs
themselves won't get clogged as a result of an under-provisioned system.
These concepts mean that, in general, Capacity and Impact is not a zero-sum system relative to Jobs and Instances/Instance Groups.
These concepts mean that, in general, Capacity and Impact is not a zero-sum system relative to Jobs and Instances/Instance Groups
### Resource Determination For Capacity Algorithm
The capacity algorithms are defined in order to determine how many `forks` a system is capable of running simultaneously. This controls how
The capacity algorithms are defined in order to determine how many `forks` a system is capable of running at the same time. This controls how
many systems Ansible itself will communicate with simultaneously. Increasing the number of forks a Tower system is running will, in general,
allow jobs to run faster by performing more work in parallel. The tradeoff is that will increase the load on the system which could cause work
allow jobs to run faster by performing more work in parallel. The tradeoff is that this will increase the load on the system which could cause work
to slow down overall.
Tower can operate in two modes when determining capacity. `mem_capacity` (the default) will allow you to overcommit CPU resources while protecting the system
from running out of memory. If most of your work is not cpu-bound then selecting this mode will maximize the number of forks.
from running out of memory. If most of your work is not CPU-bound, then selecting this mode will maximize the number of forks.
#### Memory Relative Capacity
`mem_capacity` is calculated relative to the amount of memory needed per-fork. Taking into account the overhead for Tower's internal components this comes out
to be about `100MB` per-fork. When considering the amount of memory available to Ansible jobs the capacity algorithm will reserve 2GB of memory to account
`mem_capacity` is calculated relative to the amount of memory needed per-fork. Taking into account the overhead for Tower's internal components, this comes out
to be about `100MB` per fork. When considering the amount of memory available to Ansible jobs the capacity algorithm will reserve 2GB of memory to account
for the presence of other Tower services. The algorithm itself looks like this:
(mem - 2048) / mem_per_fork
As an example:
(4096 - 2048) / 100 == ~20
So a system with 4GB of memory would be capable of running 20 forks. The value `mem_per_fork` can be controlled by setting the Tower settings value
(or environment variable) `SYSTEM_TASK_FORKS_MEM` which defaults to `100`.
#### CPU Relative Capacity
Often times Ansible workloads can be fairly cpu-bound. In these cases sometimes reducing the simultaneous workload allows more tasks to run faster and reduces
#### CPU-Relative Capacity
Often times Ansible workloads can be fairly CPU-bound. In these cases, sometimes reducing the simultaneous workload allows more tasks to run faster and reduces
the average time-to-completion of those jobs.
Just as the Tower `mem_capacity` algorithm uses the amount of memory need per-fork, the `cpu_capacity` algorithm looks at the amount of cpu resources is needed
per fork. The baseline value for this is `4` forks per-core. The algorithm itself looks like this:
Just as the Tower `mem_capacity` algorithm uses the amount of memory needed per-fork, the `cpu_capacity` algorithm looks at the amount of CPU resources is needed
per fork. The baseline value for this is `4` forks per core. The algorithm itself looks like this:
cpus * fork_per_cpu
For example a 4-core system:
For example, in a 4-core system:
4 * 4 == 16
The value `fork_per_cpu` can be controlled by setting the Tower settings value (or environment variable) `SYSTEM_TASK_FORKS_CPU` which defaults to `4`.
The value `fork_per_cpu` can be controlled by setting the Tower settings value (or environment variable) `SYSTEM_TASK_FORKS_CPU`, which defaults to `4`.
### Job Impacts Relative To Capacity
When selecting the capacity it's important to understand how each job type affects capacity.
When selecting the capacity, it's important to understand how each job type affects it.
It's helpful to understand what `forks` mean to Ansible: http://docs.ansible.com/ansible/latest/intro_configuration.html#forks
The default forks value for ansible is `5`. However, if Tower knows that you're running against fewer systems than that then the actual concurrency value
The default forks value for ansible is `5`. However, if Tower knows that you're running against fewer systems than that, then the actual concurrency value
will be lower.
When a job is run, Tower will add `1` to the number of forks selected to compensate for the Ansible parent process. So if you are running a playbook against `5`
systems with a `forks` value of `5` then the actual `forks` value from the perspective of Job Impact will be 6.
When a job is made to run, Tower will add `1` to the number of forks selected to compensate for the Ansible parent process. So if you are running a playbook against `5`
systems with a `forks` value of `5`, then the actual `forks` value from the perspective of Job Impact will be 6.
#### Impact of Job types in Tower
#### Impact of Job Types in Tower
Jobs and Ad-hoc jobs follow the above model `forks + 1`.
Jobs and Ad-hoc jobs follow the above model, `forks + 1`.
Other job types have a fixed impact:
@ -84,16 +86,15 @@ Other job types have a fixed impact:
* Project Updates: 1
* System Jobs: 5
### Selecting the right capacity
### Selecting the Right Capacity
Selecting between a `memory` focused capacity algorithm and a `cpu` focused capacity for your Tower use means you'll be selecting between a minimum
and maximum value. In the above examples the CPU capacity would allow a maximum of 16 forks while the Memory capacity would allow 20. For some systems
the disparity between these can be large and often times you may want to have a balance between these two.
Selecting between a memory-focused capacity algorithm and a CPU-focused capacity for your Tower use means you'll be selecting between a minimum
and maximum value. In the above examples, the CPU capacity would allow a maximum of 16 forks while the Memory capacity would allow 20. For some systems,
the disparity between these can be large and oftentimes you may want to have a balance between these two.
An `Instance` field `capacity_adjustment` allows you to select how much of one or the other you want to consider. It is represented as a value between 0.0
and 1.0. If set to a value of `1.0` then the largest value will be used. In the above example, that would be Memory capacity so a value of `20` forks would
An Instance field, `capacity_adjustment`, allows you to select how much of one or the other you want to consider. It is represented as a value between `0.0`
and `1.0`. If set to a value of `1.0`, then the largest value will be used. In the above example, that would be Memory capacity, so a value of `20` forks would
be selected. If set to a value of `0.0` then the smallest value will be used. A value of `0.5` would be a 50/50 balance between the two algorithms which would
be `18`:
16 + (20 - 16) * 0.5 == 18

View File

@ -85,9 +85,9 @@ hostC rabbitmq_host=10.1.0.3
- `rabbitmq_use_long_names` - RabbitMQ is pretty sensitive to what each instance is named. We are flexible enough to allow FQDNs (_host01.example.com_), short names (`host01`), or IP addresses (192.168.5.73). Depending on what is used to identify each host in the `inventory` file, this value may need to be changed. For FQDNs and IP addresses, this value needs to be `true`. For short names it should be `false`
- `rabbitmq_enable_manager` - Setting this to `true` will expose the RabbitMQ management web console on each instance.
The most important field to point out for variability is `rabbitmq_use_long_name`. This cannot be detected and no reasonable default is provided for it, so it's important to point out when it needs to be changed. If instances are provisioned to where they reference other instances internally and not on external addresses then `rabbitmq_use_long_name` semantics should follow the internal addressing (aka `rabbitmq_host`).
The most important field to point out for variability is `rabbitmq_use_long_name`. This cannot be detected and no reasonable default is provided for it, so it's important to point out when it needs to be changed. If instances are provisioned to where they reference other instances internally and not on external addresses, then `rabbitmq_use_long_name` semantics should follow the internal addressing (*i.e.*, `rabbitmq_host`).
Other than `rabbitmq_use_long_name` the defaults are pretty reasonable:
Other than `rabbitmq_use_long_name`, the defaults are pretty reasonable:
```
rabbitmq_port=5672
rabbitmq_vhost=tower
@ -105,9 +105,9 @@ Recommendations and constraints:
- Do not name any instance the same as a group name.
### Security Isolated Rampart Groups
### Security-Isolated Rampart Groups
In Tower versions 3.2+ customers may optionally define isolated groups inside of security-restricted networking zones from which to run jobs and ad hoc commands. Instances in these groups will _not_ have a full install of Tower, but will have a minimal set of utilities used to run jobs. Isolated groups must be specified in the inventory file prefixed with `isolated_group_`. An example inventory file is shown below:
In Tower versions 3.2+, customers may optionally define isolated groups inside of security-restricted networking zones from which to run jobs and ad hoc commands. Instances in these groups will _not_ have a full install of Tower, but will have a minimal set of utilities used to run jobs. Isolated groups must be specified in the inventory file prefixed with `isolated_group_`. An example inventory file is shown below:
```
[tower]
@ -154,18 +154,18 @@ Recommendations for system configuration with isolated groups:
Isolated Instance Authentication
--------------------------------
By default - at installation time - a randomized RSA key is generated and distributed as an authorized key to all "isolated" instances. The private half of the key is encrypted and stored within Tower, and is used to authenticat from "controller" instances to "isolated" instances when jobs are run.
At installation time, by default, a randomized RSA key is generated and distributed as an authorized key to all "isolated" instances. The private half of the key is encrypted and stored within Tower, and is used to authenticate from "controller" instances to "isolated" instances when jobs are run.
For users who wish to manage SSH authentication from controlling instances to isolated instances via some system _outside_ of Tower (such as externally-managed passwordless SSH keys), this behavior can be disabled by unsetting two Tower API settings values:
For users who wish to manage SSH authentication from controlling instances to isolated instances via some system _outside_ of Tower (such as externally-managed, password-less SSH keys), this behavior can be disabled by unsetting two Tower API settings values:
`HTTP PATCH /api/v2/settings/jobs/ {'AWX_ISOLATED_PRIVATE_KEY': '', 'AWX_ISOLATED_PUBLIC_KEY': ''}`
### Provisioning and Deprovisioning Instances and Groups
* **Provisioning** - Provisioning Instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contain all passwords and information used when installing the cluster or other instances may be reconfigured (this could be intentional).
* **Provisioning** - Provisioning Instances after installation is supported by updating the `inventory` file and re-running the setup playbook. It's important that this file contain all passwords and information used when installing the cluster, or other instances may be reconfigured (this can be done intentionally).
* **Deprovisioning** - Tower does not automatically de-provision instances since it cannot distinguish between an instance that was taken offline intentionally or due to failure. Instead the procedure for deprovisioning an instance is to shut it down (or stop the `ansible-tower-service`) and run the Tower deprovision command:
* **Deprovisioning** - Tower does not automatically de-provision instances since it cannot distinguish between an instance that was taken offline intentionally or due to failure. Instead, the procedure for de-provisioning an instance is to shut it down (or stop the `ansible-tower-service`) and run the Tower de-provision command:
```
$ awx-manage deprovision_instance --hostname=<hostname>
@ -179,7 +179,7 @@ $ awx-manage unregister_queue --queuename=<name>
### Configuring Instances and Instance Groups from the API
Instance Groups can be created by posting to `/api/v2/instance_groups` as a System Admin.
Instance Groups can be created by posting to `/api/v2/instance_groups` as a System Administrator.
Once created, `Instances` can be associated with an Instance Group with:
@ -205,12 +205,13 @@ Instance Group Policies are controlled by three optional fields on an `Instance
* `Instances` that are assigned directly to `Instance Groups` by posting to `/api/v2/instance_groups/x/instances` or `/api/v2/instances/x/instance_groups` are automatically added to the `policy_instance_list`. This means they are subject to the normal caveats for `policy_instance_list` and must be manually managed.
* `policy_instance_percentage` and `policy_instance_minimum` work together. For example, if you have a `policy_instance_percentage` of 50% and a `policy_instance_minimum` of 2 and you start 6 `Instances`, 3 of them would be assigned to the `Instance Group`. If you reduce the number of `Instances` to 2 then both of them would be assigned to the `Instance Group` to satisfy `policy_instance_minimum`. In this way, you can set a lower bound on the amount of available resources.
* `policy_instance_percentage` and `policy_instance_minimum` work together. For example, if you have a `policy_instance_percentage` of 50% and a `policy_instance_minimum` of 2 and you start 6 `Instances`, 3 of them would be assigned to the `Instance Group`. If you reduce the number of `Instances` to 2, then both of them would be assigned to the `Instance Group` to satisfy `policy_instance_minimum`. In this way, you can set a lower bound on the amount of available resources.
* Policies don't actively prevent `Instances` from being associated with multiple `Instance Groups` but this can effectively be achieved by making the percentages sum to 100. If you have 4 `Instance Groups`, assign each a percentage value of 25 and the `Instances` will be distributed among them with no overlap.
### Manually Pinning Instances to Specific Groups
If you have a special `Instance` which needs to be _exclusively_ assigned to a specific `Instance Group` but don't want it to automatically join _other_ groups via "percentage" or "minimum" policies:
1. Add the `Instance` to one or more `Instance Group`s' `policy_instance_list`.
@ -243,6 +244,7 @@ Tower itself reports as much status as it can via the API at `/api/v2/ping` in o
A more detailed view of Instances and Instance Groups, including running jobs and membership
information can be seen at `/api/v2/instances/` and `/api/v2/instance_groups`.
### Instance Services and Failure Behavior
Each Tower instance is made up of several different services working collaboratively:
@ -253,14 +255,14 @@ Each Tower instance is made up of several different services working collaborati
* **RabbitMQ** - A Message Broker, this is used as a signaling mechanism for Celery as well as any event data propagated to the application.
* **Memcached** - A local caching service for the instance it lives on.
Tower is configured in such a way that if any of these services or their components fail, then all services are restarted. If these fail sufficiently often in a short span of time, then the entire instance will be placed offline in an automated fashion in order to allow remediation without causing unexpected behavior.
Tower is configured in such a way that if any of these services or their components fail, then all services are restarted. If these fail sufficiently (often in a short span of time), then the entire instance will be placed offline in an automated fashion in order to allow remediation without causing unexpected behavior.
### Job Runtime Behavior
Ideally a regular user of Tower should not notice any semantic difference to the way jobs are run and reported. Behind the scenes it is worth pointing out the differences in how the system behaves.
When a job is submitted from the API interface it gets pushed into the Celery queue on RabbitMQ. A single RabbitMQ instance is the responsible master for individual queues, but each Tower instance will connect to and receive jobs from that queue using a Fair scheduling algorithm. Any instance on the cluster is just as likely to receive the work and execute the task. If an instance fails while executing jobs, then the work is marked as permanently failed.
When a job is submitted from the API interface, it gets pushed into the Dispatcher queue on RabbitMQ. A single RabbitMQ instance is the responsible master for individual queues, but each Tower instance will connect to and receive jobs from that queue using a fair-share scheduling algorithm. Any instance on the cluster is just as likely to receive the work and execute the task. If an instance fails while executing jobs, then the work is marked as permanently failed.
If a cluster is divided into separate Instance Groups, then the behavior is similar to the cluster as a whole. If two instances are assigned to a group then either one is just as likely to receive a job as any other in the same group.
@ -270,60 +272,56 @@ It's important to note that not all instances are required to be provisioned wit
If an Instance Group is configured but all instances in that group are offline or unavailable, any jobs that are launched targeting only that group will be stuck in a waiting state until instances become available. Fallback or backup resources should be provisioned to handle any work that might encounter this scenario.
#### Project synchronization behavior
#### Project Synchronization Behavior
Project updates behave differently than they did before. Previously they were ordinary jobs that ran on a single instance. It's now important that they run successfully on any instance that could potentially run a job. Projects will sync themselves to the correct version on the instance immediately prior to running the job. If the needed revision is already locally checked out and galaxy or collections updates are not needed, then a sync may not be performed.
Project updates behave differently than they did before. Previously they were ordinary jobs that ran on a single instance. It's now important that they run successfully on any instance that could potentially run a job. Projects will sync themselves to the correct version on the instance immediately prior to running the job. If the needed revision is already locally checked out and Galaxy or Collections updates are not needed, then a sync may not be performed.
When the sync happens, it is recorded in the database as a project update with a `launch_type` of "sync" and a `job_type` of "run". Project syncs will not change the status or version of the project; instead, they will update the source tree _only_ on the instance where they run. The only exception to this behavior is when the project is in the "never updated" state (meaning that no project updates of any type have been run), in which case a sync should fill in the project's initial revision and status, and subsequent syncs should not make such changes.
#### Controlling where a particular job runs
#### Controlling Where a Particular Job Runs
By default, a job will be submitted to the `tower` queue, meaning that it can be picked up by any of the workers.
##### How to restrict the instances a job will run on
##### How to Restrict the Instances a Job Will Run On
If any of the job template, inventory,
or organization has instance groups associated with them, a job run from that job template will not be eligible for the default behavior. That means that if all of the instance associated with these three resources are out of capacity, the job will remain in the `pending` state until capacity frees up.
If the Job Template, Inventory, or Organization have instance groups associated with them, a job run from that Job Template will not be eligible for the default behavior. This means that if all of the instance associated with these three resources are out of capacity, the job will remain in the `pending` state until capacity frees up.
##### How to set up a preferred instance group
##### How to Set Up a Preferred Instance Group
The order of preference in determining which instance group to which the job gets submitted is as follows:
The order of preference in determining which instance group the job gets submitted to is as follows:
1. Job Template
2. Inventory
3. Organization (by way of Inventory)
To expand further: If instance groups are associated with the job template and all of them are at capacity, then the job will be submitted to instance groups specified on inventory, and then organization.
To expand further: If instance groups are associated with the Job Template and all of them are at capacity, then the job will be submitted to instance groups specified on Inventory, and then Organization.
The global `tower` group can still be associated with a resource, just like any of the custom instance groups defined in the playbook. This can be used to specify a preferred instance group on the job template or inventory, but still allow the job to be submitted to any instance if those are out of capacity.
#### Instance Enable / Disable
In order to support temporarily taking an `Instance` offline there is a boolean property `enabled` defined on each instance.
In order to support temporarily taking an `Instance` offline, there is a boolean property `enabled` defined on each instance.
When this property is disabled no jobs will be assigned to that `Instance`. Existing jobs will finish but no new work will be
assigned.
When this property is disabled, no jobs will be assigned to that `Instance`. Existing jobs will finish but no new work will be assigned.
## Acceptance Criteria
When verifying acceptance we should ensure the following statements are true
When verifying acceptance, we should ensure that the following statements are true:
* Tower should install as a standalone Instance
* Tower should install in a Clustered fashion
* Instance should, optionally, be able to be grouped arbitrarily into different Instance Groups
* Capacity should be tracked at the group level and capacity impact should make sense relative to what instance a job is
running on and what groups that instance is a member of.
* Instances should, optionally, be able to be grouped arbitrarily into different Instance Groups
* Capacity should be tracked at the group level and capacity impact should make sense relative to what instance a job is running on and what groups that instance is a member of
* Provisioning should be supported via the setup playbook
* De-provisioning should be supported via a management command
* All jobs, inventory updates, and project updates should run successfully
* Jobs should be able to run on hosts which it is targeted. If assigned implicitly or directly to groups then it should
only run on instances in those Instance Groups.
* Jobs should be able to run on hosts for which they are targeted; if assigned implicitly or directly to groups, then they should only run on instances in those Instance Groups
* Project updates should manifest their data on the host that will run the job immediately prior to the job running
* Tower should be able to reasonably survive the removal of all instances in the cluster
* Tower should behave in a predictable fashiong during network partitioning
* Tower should behave in a predictable fashion during network partitioning
## Testing Considerations
@ -331,39 +329,30 @@ When verifying acceptance we should ensure the following statements are true
* Basic playbook testing to verify routing differences, including:
- Basic FQDN
- Short-name name resolution
- ip addresses
- /etc/hosts static routing information
* We should test behavior of large and small clusters. I would envision small clusters as 2 - 3 instances and large
clusters as 10 - 15 instances
* Failure testing should involve killing single instances and killing multiple instances while the cluster is performing work.
Job failures during the time period should be predictable and not catastrophic.
* Instance downtime testing should also include recoverability testing. Killing single services and ensuring the system can
return itself to a working state
* Persistent failure should be tested by killing single services in such a way that the cluster instance cannot be recovered
and ensuring that the instance is properly taken offline
* Network partitioning failures will be important also. In order to test this
- IP addresses
- `/etc/hosts` static routing information
* We should test behavior of large and small clusters; small clusters usually consist of 2 - 3 instances and large clusters have 10 - 15 instances.
* Failure testing should involve killing single instances and killing multiple instances while the cluster is performing work. Job failures during the time period should be predictable and not catastrophic.
* Instance downtime testing should also include recoverability testing (killing single services and ensuring the system can return itself to a working state).
* Persistent failure should be tested by killing single services in such a way that the cluster instance cannot be recovered and ensuring that the instance is properly taken offline.
* Network partitioning failures will also be important. In order to test this:
- Disallow a single instance from communicating with the other instances but allow it to communicate with the database
- Break the link between instances such that it forms 2 or more groups where groupA and groupB can't communicate but all instances
can communicate with the database.
* Crucially when network partitioning is resolved all instances should recover into a consistent state
* Upgrade Testing, verify behavior before and after are the same for the end user.
* Project Updates should be thoroughly tested for all scm types (git, svn, hg) and for manual projects.
- Break the link between instances such that it forms two or more groups where Group A and Group B can't communicate but all instances can communicate with the database.
* Crucially, when network partitioning is resolved, all instances should recover into a consistent state.
* Upgrade Testing - verify behavior before and after are the same for the end user.
* Project Updates should be thoroughly tested for all SCM types (`git`, `svn`, `hg`) and for manual projects.
* Setting up instance groups in two scenarios:
a) instances are shared between groups
b) instances are isolated to particular groups
Organizations, Inventories, and Job Templates should be variously assigned to one or many groups and jobs should execute
in those groups in preferential order as resources are available.
Organizations, Inventories, and Job Templates should be variously assigned to one or many groups and jobs should execute in those groups in preferential order as resources are available.
## Performance Testing
Performance testing should be twofold.
Performance testing should be twofold:
* Large volume of simultaneous jobs.
* Jobs that generate a large amount of output.
* A large volume of simultaneous jobs
* Jobs that generate a large amount of output
These should also be benchmarked against the same playbooks using the 3.0.X Tower release and a stable Ansible version.
For a large volume playbook I might recommend a customer provided one that we've seen recently:
These should also be benchmarked against the same playbooks using the 3.0.X Tower release and a stable Ansible version. For a large volume playbook (*e.g.*, against 100+ hosts), something like the following is recommended:
https://gist.github.com/michelleperz/fe3a0eb4eda888221229730e34b28b89
Against 100+ hosts.

View File

@ -1,19 +1,18 @@
## Collections
AWX supports using Ansible collections.
This section will give ways to use collections in job runs.
AWX supports the use of Ansible Collections. This section will give ways to use Collections in job runs.
### Project Collections Requirements
If you specify a collections requirements file in SCM at `collections/requirements.yml`,
then AWX will install collections in that file in the implicit project sync
If you specify a Collections requirements file in SCM at `collections/requirements.yml`,
then AWX will install Collections in that file in the implicit project sync
before a job run. The invocation is:
```
ansible-galaxy collection install -r requirements.yml -p <job tmp location>
```
Example of tmp directory where job is running:
Example of `tmp` directory where job is running:
```
├── project

View File

@ -0,0 +1 @@
This folder contains documentation related to credentials in AWX / Ansible Tower.

View File

@ -2,7 +2,7 @@ Credential Plugins
==================
By default, sensitive credential values (such as SSH passwords, SSH private
keys, API tokens for cloud services) in AWX are stored in the AWX database
keys, API tokens for cloud services, etc.) in AWX are stored in the AWX database
after being encrypted with a symmetric encryption cipher utilizing AES-256 in
CBC mode alongside a SHA-256 HMAC.
@ -19,9 +19,9 @@ When configuring AWX to pull a secret from a third party system, there are
generally three steps.
Here is an example of creating an (1) AWX Machine Credential with
a static username, `example-user` and (2) an externally sourced secret from
a static username, `example-user` and (2) an externally-sourced secret from
HashiCorp Vault Key/Value system which will populate the (3) password field on
the Machine Credential.
the Machine Credential:
1. Create the Machine Credential with a static username, `example-user`.
@ -29,13 +29,13 @@ the Machine Credential.
secret management system (in this example, specifying a URL and an
OAuth2.0 token _to access_ HashiCorp Vault)
3. _Link_ the `password` field for the Machine credential to the external
system by specifying the source (in this example, the HashiCorp credential)
3. _Link_ the `password` field for the Machine Credential to the external
system by specifying the source (in this example, the HashiCorp Credential)
and metadata about the path (e.g., `/some/path/to/my/password/`).
Note that you can perform these lookups on *any* field for any non-external
credential, including those with custom credential types. You could just as
easily create an AWS credential and use lookups to retrieve the Access Key and
easily create an AWS Credential and use lookups to retrieve the Access Key and
Secret Key from an external secret management system. External credentials
cannot have lookups applied to their fields.
@ -150,10 +150,10 @@ HashiCorp Vault KV
AWX supports retrieving secret values from HashiCorp Vault KV
(https://www.vaultproject.io/api/secret/kv/)
The following example illustrates how to configure a Machine credential to pull
its password from an HashiCorp Vault:
The following example illustrates how to configure a Machine Credential to pull
its password from a HashiCorp Vault:
1. Look up the ID of the Machine and HashiCorp Vault Secret Lookup credential
1. Look up the ID of the Machine and HashiCorp Vault Secret Lookup Credential
types (in this example, `1` and `15`):
```shell
@ -182,7 +182,7 @@ HTTP/1.1 200 OK
...
```
2. Create a Machine and a HashiCorp Vault credential:
2. Create a Machine and a HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/" \
@ -214,7 +214,7 @@ HTTP/1.1 201 Created
...
```
3. Link the Machine credential to the HashiCorp Vault credential:
3. Link the Machine Credential to the HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/1/input_sources/" \
@ -232,10 +232,10 @@ HashiCorp Vault SSH Secrets Engine
AWX supports signing public keys via HashiCorp Vault's SSH Secrets Engine
(https://www.vaultproject.io/api/secret/ssh/)
The following example illustrates how to configure a Machine credential to sign
The following example illustrates how to configure a Machine Credential to sign
a public key using HashiCorp Vault:
1. Look up the ID of the Machine and HashiCorp Vault Signed SSH credential
1. Look up the ID of the Machine and HashiCorp Vault Signed SSH Credential
types (in this example, `1` and `16`):
```shell
@ -263,7 +263,7 @@ HTTP/1.1 200 OK
"name": "HashiCorp Vault Signed SSH",
```
2. Create a Machine and a HashiCorp Vault credential:
2. Create a Machine and a HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/" \
@ -295,7 +295,7 @@ HTTP/1.1 201 Created
...
```
3. Link the Machine credential to the HashiCorp Vault credential:
3. Link the Machine Credential to the HashiCorp Vault Credential:
```shell
~ curl -sik "https://awx.example.org/api/v2/credentials/1/input_sources/" \
@ -306,7 +306,7 @@ HTTP/1.1 201 Created
HTTP/1.1 201 Created
```
4. Associate the Machine credential with a Job Template. When the Job Template
4. Associate the Machine Credential with a Job Template. When the Job Template
is run, AWX will use the provided HashiCorp URL and token to sign the
unsigned public key data using the HashiCorp Vault SSH Secrets API.
AWX will generate an `id_rsa` and `id_rsa-cert.pub` on the fly and

View File

@ -27,7 +27,7 @@ Important Changes
By utilizing these custom ``Credential Types``, customers have the ability to
define custom "Cloud" and "Network" ``Credential Types`` which
modify environment variables, extra vars, and generate file-based
credentials (such as file-based certificates or .ini files) at
credentials (such as file-based certificates or `.ini` files) at
`ansible-playbook` runtime.
* Multiple ``Credentials`` can now be assigned to a ``Job Template`` as long as
@ -136,9 +136,10 @@ ordered fields for that type:
"multiline": false # if true, the field should be rendered
# as multi-line for input entry
# (only applicable to `type=string`)
"default": "default value" # optional, can be used to provide a
# default value if the field is left empty
# when creating a credential of this type
# default value if the field is left empty;
# when creating a credential of this type,
# credential forms will use this value
# as a prefill when making credentials of
# this type
@ -164,7 +165,7 @@ When `type=string`, fields can optionally specify multiple choice options:
Defining Custom Credential Type Injectors
-----------------------------------------
A ``Credential Type`` can inject ``Credential`` values through the use
of the Jinja templating language (which should be familiar to users of Ansible):
of the [Jinja templating language](https://jinja.palletsprojects.com/en/2.10.x/) (which should be familiar to users of Ansible):
"injectors": {
"env": {
@ -175,7 +176,7 @@ of the Jinja templating language (which should be familiar to users of Ansible):
}
}
``Credential Types`` can also generate temporary files to support .ini files or
``Credential Types`` can also generate temporary files to support `.ini` files or
certificate/key data:
"injectors": {
@ -274,7 +275,7 @@ Additional Criteria
Acceptance Criteria
-------------------
When verifying acceptance we should ensure the following statements are true:
When verifying acceptance, the following statements should be true:
* `Credential` injection for playbook runs, SCM updates, inventory updates, and
ad-hoc runs should continue to function as they did prior to Tower 3.2 for the
@ -290,15 +291,15 @@ When verifying acceptance we should ensure the following statements are true:
* Users should not be able to use the syntax for injecting single and
multiple files in the same custom credential.
* The default `Credential Types` included with Tower in 3.2 should be
non-editable/readonly and cannot be deleted by any user.
non-editable/read-only and unable to be deleted by any user.
* Stored `Credential` values for _all_ types should be consistent before and
after Tower 3.2 migration/upgrade.
after a Tower 3.2 migration/upgrade.
* `Job Templates` should be able to specify multiple extra `Credentials` as
defined in the constraints in this document.
* Custom inventory sources should be able to specify a cloud/network
`Credential` and they should properly update the environment (environment
variables, extra vars, written files) when an inventory source update runs.
* If a `Credential Type` is being used by one or more `Credentials`, the fields
defined in its ``inputs`` should be read-only.
* `Credential Types` should support activity stream history for basic object
defined in its `inputs` should be read-only.
* `Credential Types` should support Activity Stream history for basic object
modification.

View File

@ -1,38 +1,32 @@
Multi-Credential Assignment
===========================
awx has added support for assigning zero or more credentials to
JobTemplates and InventoryUpdates via a singular, unified interface.
AWX has added support for assigning zero or more credentials to Job Templates and Inventory Updates via a singular, unified interface.
Background
----------
Prior to awx (Tower 3.2), Job Templates had a certain set of requirements
surrounding their relation to Credentials:
Prior to AWX (Tower 3.2), Job Templates had a certain set of requirements surrounding their relation to Credentials:
* All Job Templates (and Jobs) were required to have exactly *one* Machine/SSH
or Vault credential (or one of both).
* All Job Templates (and Jobs) could have zero or more "extra" Credentials.
* These extra Credentials represented "Cloud" and "Network" credentials that
* could be used to provide authentication to external services via environment
* variables (e.g., AWS_ACCESS_KEY_ID).
* Could be used to provide authentication to external services via environment
* Variables (*e.g.*, `AWS_ACCESS_KEY_ID`).
This model required a variety of disjoint interfaces for specifying Credentials
on a JobTemplate. For example, to modify assignment of Machine/SSH and Vault
credentials, you would change the Credential key itself:
This model required a variety of disjoint interfaces for specifying Credentials on a Job Template. For example, to modify assignment of Machine/SSH and Vault credentials, you would change the Credential key itself:
`PATCH /api/v2/job_templates/N/ {'credential': X, 'vault_credential': Y}`
Modifying `extra_credentials` was accomplished on a separate API endpoint
via association/disassociation actions:
Modifying `extra_credentials` was accomplished on a separate API endpoint via association/disassociation actions:
```
POST /api/v2/job_templates/N/extra_credentials {'associate': true, 'id': Z}
POST /api/v2/job_templates/N/extra_credentials {'disassociate': true, 'id': Z}
```
This model lacked the ability associate multiple Vault credentials with
a playbook run, a use case supported by Ansible core from Ansible 2.4 onwards.
This model lacked the ability associate multiple Vault credentials with a playbook run, a use case supported by Ansible core from Ansible 2.4 onwards.
This model also was a stumbling block for certain playbook execution workflows.
For example, some users wanted to run playbooks with `connection:local` that
@ -40,10 +34,11 @@ only interacted with some cloud service via a cloud Credential. In this
scenario, users often generated a "dummy" Machine/SSH Credential to attach to
the Job Template simply to satisfy the requirement on the model.
Important Changes
-----------------
JobTemplates now have a single interface for Credential assignment:
Job Templates now have a single interface for Credential assignment:
`GET /api/v2/job_templates/N/credentials/`
@ -56,13 +51,12 @@ POST /api/v2/job_templates/N/credentials/ {'associate': true, 'id': X'}
POST /api/v2/job_templates/N/credentials/ {'disassociate': true, 'id': Y'}
```
Under this model, a JobTemplate is considered valid even when it has _zero_
Credentials assigned to it.
Under this model, a Job Template is considered valid even when it has _zero_ Credentials assigned to it.
Launch Time Considerations
--------------------------
Prior to this change, JobTemplates had a configurable attribute,
Prior to this change, Job Templates had a configurable attribute,
`ask_credential_on_launch`. This value was used at launch time to determine
which missing credential values were necessary for launch - this was primarily
used as a mechanism for users to specify an SSH (or Vault) credential to satisfy
@ -72,7 +66,7 @@ Under the new unified Credential list model, this attribute still exists, but it
is no longer bound to a notion of "requiring" a Credential. Now when
`ask_credential_on_launch` is `True`, it signifies that users may (if they
wish) specify a list of credentials at launch time to override those defined on
the JobTemplate:
the Job Template:
`POST /api/v2/job_templates/N/launch/ {'credentials': [A, B, C]}`
@ -166,18 +160,18 @@ Specifying Multiple Vault Credentials
One interesting use case supported by the new "zero or more credentials" model
is the ability to assign multiple Vault credentials to a Job Template run.
This specific use case covers Ansible's support for multiple vault passwords for
This specific use case covers Ansible's support for multiple Vault passwords for
a playbook run (since Ansible 2.4):
http://docs.ansible.com/ansible/latest/vault.html#vault-ids-and-multiple-vault-passwords
Vault credentials in awx now have an optional field, `vault_id`, which is
Vault credentials in AWX now have an optional field, `vault_id`, which is
analogous to the `--vault-id` argument to `ansible-playbook`. To run
a playbook which makes use of multiple vault passwords:
a playbook which makes use of multiple Vault passwords:
1. Make a Vault credential in Tower for each vault password; specify the Vault
1. Make a Vault credential in Tower for each Vault password; specify the Vault
ID as a field on the credential and input the password (which will be
encrypted and stored).
2. Assign multiple vault credentials to the job template via the new
2. Assign multiple Vault credentials to the job template via the new
`credentials` endpoint:
```
@ -188,7 +182,7 @@ a playbook which makes use of multiple vault passwords:
'id': X
}
```
3. Launch the job template, and `ansible-playbook` will be invoked with
3. Launch the Job Template, and `ansible-playbook` will be invoked with
multiple `--vault-id` arguments.
Prompted Vault Credentials
@ -223,11 +217,11 @@ Inventory Source Credentials
----------------------------
Inventory sources and inventory updates that they spawn also use the same
relationship. The new endpoints for this are
relationship. The new endpoints for this are:
- `/api/v2/inventory_sources/N/credentials/` and
- `/api/v2/inventory_updates/N/credentials/`
Most cloud sources will continue to adhere to the constraint that they
must have a single credential that corresponds to their cloud type.
However, this relationship allows users to associate multiple vault
credentials of different ids to inventory sources.
credentials of different IDs to inventory sources.

View File

@ -5,29 +5,29 @@ Django Debug Toolbar (DDT)
----------------
This is a useful tool for examining SQL queries, performance, headers, requests, signals, cache, logging, and more.
To enable DDT, you need to set your INTERNAL_IPS to the IP address of your load balancer. This can be overriden in `local_settings`.
This IP address can be found by making a GET to any page on the browsable API and looking for a like this in the standard output.
To enable DDT, you need to set your `INTERNAL_IPS` to the IP address of your load balancer. This can be overriden in `local_settings`.
This IP address can be found by making a GET to any page on the browsable API and looking for a like this in the standard output:
```
awx_1 | 14:42:08 uwsgi.1 | 172.18.0.1 GET /api/v2/tokens/ - HTTP/1.1 200
```
Whitelist this IP address by adding it to the INTERNAL_IPS variable in local_settings, then navigate to the API and you should see DDT on the
right side. If you don't see it, make sure `DEBUG=True`.
> Note that enabling DDT is detrimental to the performance of AWX and adds overhead to every API request. It is
Whitelist this IP address by adding it to the `INTERNAL_IPS` variable in `local_settings`, then navigate to the API and you should see DDT on the
right side. If you don't see it, make sure to set `DEBUG=True`.
> Note that enabling DDT is detrimental to the performance of AWX and adds overhead to every API request. It is
recommended to keep this turned off when you are not using it.
SQL Debugging
-------------
AWX includes a powerful tool for tracking slow queries across all of its Python processes.
As the awx user, run:
As the AWX user, run:
```
$ awx-manage profile_sql --threshold 2 --minutes 5
```
...where threshold is the max query time in seconds, and minutes it the number of minutes to record.
For the next five minutes (in this example), any awx Python process that generates a SQL query
that runs for >2s will be recorded in a .sqlite database in /var/log/tower/profile
For the next five minutes (in this example), any AWX Python process that generates a SQL query
that runs for >2s will be recorded in a `.sqlite` database in `/var/log/tower/profile`.
This is a useful tool for logging all queries at a per-process level, or filtering and searching for
queries within a certain code branch. For example, if you observed that certain HTTP requests were
@ -38,7 +38,7 @@ $ sqlite3 -column -header /var/log/tower/profile/uwsgi.sqlite
sqlite> .schema queries
CREATE TABLE queries (
id INTEGER PRIMARY KEY,
version TEXT, # the awx version
version TEXT, # the AWX version
pid INTEGER, # the pid of the process
stamp DATETIME DEFAULT CURRENT_TIMESTAMP,
argv REAL, # argv of the process
@ -90,7 +90,7 @@ a `continue` command.
To simplify remote debugging session management, Tower's development
environment comes with tooling that can automatically discover open
remote debugging sessions and automatically connect to them. From your *host*
machine (i.e., _outside_ of the development container), you can run:
machine (*i.e.*, _outside_ of the development container), you can run:
```
sdb-listen

View File

@ -1,4 +1,5 @@
# AWX as an Ansible Fact Cache
AWX can store and retrieve per-host facts via an Ansible Fact Cache Plugin.
This behavior is configurable on a per-job-template basis. When enabled, AWX
will serve fact requests for all Hosts in an Inventory related to the Job
@ -7,14 +8,14 @@ having access to the entire Inventory of Host facts.
## AWX Fact Cache Implementation Details
### AWX Injection
In order to understand the behavior of AWX as a fact cache you will need to
In order to understand the behavior of AWX as a fact cache, you will need to
understand how fact caching is achieved in AWX. When a Job launches with
`use_fact_cache=True`, AWX will write all `ansible_facts` associated with
each Host in the associated Inventory as JSON files on the local file system
(one JSON file per host). Jobs invoked with `use_fact_cache=False` will not
write `ansible_facts` files.
### Ansible plugin usage
### Ansible Plugin Usage
When `use_fact_cache=True`, Ansible will be configured to use the `jsonfile`
cache plugin. Any `get()` call to the fact cache interface in Ansible will
result in a JSON file lookup for the host-specific set of facts. Any `set()`
@ -30,13 +31,13 @@ subsequent playbook runs, AWX will _only_ inject cached facts that are _newer_
than `settings.ANSIBLE_FACT_CACHE_TIMEOUT` seconds.
## AWX Fact Logging
New and changed facts will be logged via AWX's logging facility. Specifically,
New and changed facts will be logged via AWX's logging facility, specifically
to the `system_tracking` namespace or logger. The logging payload will include
the fields: `host_name`, `inventory_id`, and `ansible_facts`. Where
`ansible_facts` is a dictionary of all ansible facts for `host_name` in AWX
the fields `host_name`, `inventory_id`, and `ansible_facts`. Where
`ansible_facts` is a dictionary of all Ansible facts for `host_name` in AWX
Inventory `inventory_id`.
## Integration Testing
* ensure `clear_facts` set's `hosts/<id>/ansible_facts` to `{}`
* ensure that `gather_facts: False` does NOT result in clearing existing facts
* ensure that the when a host fact timeout is reached, that the facts are not used from the cache
* Ensure `clear_facts` sets `hosts/<id>/ansible_facts` to `{}`.
* Ensure that `gather_facts: False` does NOT result in clearing existing facts.
* Ensure that when a host fact timeout is reached, that the facts are not used from the cache.

View File

@ -1,15 +1,23 @@
# Insights Integration
Insights provides remediation playbooks that Tower executes. Tower also provides a view of Insights discovered misconfigurations and problems via proxying Insights API requests. Credentials to access the Insights API are stored in Tower and can be related to an Inventory in which Insights hosts are presumed to exist. This same Insights Credential can be attached to a Tower Project. The Tower Project will pull the Insights rememdiation plans whenever a Project Update is ran.
Insights provides remediation playbooks that Tower executes. Tower also provides a view of Insights-discovered misconfigurations and problems via proxying Insights API requests. Credentials to access the Insights API are stored in Tower and can be related to an Inventory in which Insights hosts are presumed to exist. This same Insights Credential can be attached to a Tower Project. The Tower Project will pull the Insights remediation plans whenever a Project Update runs.
## Tower Insights Credential
Tower has an Insights Credential to store the username and password to be used when accessing the Insights API. The Insights Credential is a new `CredentialType` in the Tower system. The Insights Credential can be associated with an Insights Project and any non-smart Inventory.
## Tower Recognized Insights Host
Tower considers a Host an Insights Host when the attribute `insights_system_id` on the Host is set to something other than `null`. The `insights_system_id` is used to identify the host in the Insights system when making proxied requests to the Insights API. The `insights_system_id` is set as a result of the fact scan playbook being ran. Specifically, as a result of the `scan_insights` Ansible module being called. The `scan_insights` module will read the value from the file `/etc/redhat-access-insights/machine-id` on a host. If found, the value will be assigned to the `insights_system_id` for that host. The host would then be considered an Insights host.
Tower considers a host an Insights Host when the attribute `insights_system_id` on the host is set to something other than `null`. The `insights_system_id` is used to identify the host in the Insights system when making proxied requests to the Insights API. The `insights_system_id` is set as a result of the fact scan playbook that is running (specifically, as a result of the `scan_insights` Ansible module being called). The `scan_insights` module will read the value from the file `/etc/redhat-access-insights/machine-id` on a host. If found, the value will be assigned to the `insights_system_id` for that host. The host would then be considered an Insights host.
## Tower Insights Proxy
Insights data for a Tower recognized Insights Host can be gotten from the insights endpoint hanging off the host detail endpoint. Each time the insights endpoint has a `GET` request issued to it the backend issues a request to the Insights API for the `insights_system_id`. The response is then returned in the same get/response cycle. The response contains Insights details for the host including (1) the current plans, (2) reports, and (3) rules.
Insights data for a Tower-recognized Insights Host can be acquired from the insights endpoint hanging off of the host detail endpoint. Each time the Insights endpoint has a `GET` request issued to it, the backend issues a request to the Insights API for the `insights_system_id`. The response is then returned in the same get/response cycle. The response contains Insights details for the host including (1) the current plans, (2) reports, and (3) rules.
`/api/v2/hosts/<id>/insights/`
## Tower Insights Project
Tower has a Project exclusively for Insights. Projects of type Insights can attach a special Insights Credential. The Insights Credential is used for accessing the Insights API in the Project Update playbook. The `scm_revision` for an Insights Project differs from traditional SCM backed projects. The `scm_revision` is the Insights `ETag` http header value returned when making a plan requests to the Insights API during a Project update run. The `ETag` communications a version derived from the response data. During a Project update, the Project's `scm_revision` will be updated with the new `ETag`. The `ETag` will also be written to disk in the Project directory as `.scm_revision`. The Project update will download the remediation playbooks if the `.scm_revision` does not equal the `ETag`.
Tower has a Project exclusively for Insights. Projects of type Insights can attach a special Insights Credential. The Insights Credential is used for accessing the Insights API in the Project Update playbook. The `scm_revision` for an Insights Project differs from traditional SCM backed projects. The `scm_revision` is the Insights `ETag` HTTP header value returned when making a plan requests to the Insights API during a Project update run. The `ETag` communications a version derived from the response data. During a Project Update, the Project's `scm_revision` will be updated with the new `ETag`. The `ETag` will also be written to disk in the Project directory as `.scm_revision`. The Project update will download the remediation playbooks if the `.scm_revision` does not equal the `ETag`.

1
docs/inventory/README.md Normal file
View File

@ -0,0 +1 @@
This folder contains documentation related to inventories in AWX / Ansible Tower.

View File

@ -1,61 +1,70 @@
# Transition to Ansible Inventory Plugins
Inventory updates have changed from using scripts which are vendored as executable Python scripts to using dynamically-generated YAML files which conform to the specifications of the `auto` inventory plugin. These are then parsed by their respective inventory plugin.
Inventory updates have changed from using scripts, which are vendored as executable Python scripts, to using dynamically-generated YAML files which conform to the specifications of the `auto` inventory plugin. These are then parsed by their respective inventory plugin.
The major organizational change is that the inventory plugins are part of the Ansible core distribution, whereas the same logic used to be a part of AWX source.
## Prior Background for Transition
AWX used to maintain logic that parsed `.ini` inventory file contents, in addition to interpreting the JSON output of scripts, re-calling with the `--host` option in cases where the `_meta.hostvars` key was not provided.
### Switch to Ansible Inventory
The CLI entry point `ansible-inventory` was introduced in Ansible 2.4. In Tower 3.2, inventory imports began running this command as an intermediary between the inventory and the import's logic to save content to database. Using `ansible-inventory` eliminates the need to maintain source-specific logic, relying on Ansible's code instead. This also allows us to count on a consistent data structure outputted from `ansible-inventory`. There are many valid structures that a script can provide, but the output from `ansible-inventory` will always be the same, thus the AWX logic to parse the content is simplified. This is why even scripts must be ran through the `ansible-inventory` CLI.
The CLI entry point `ansible-inventory` was introduced in Ansible 2.4. In Tower 3.2, inventory imports began running this command as an intermediary between the inventory and the import's logic to save content to the database. Using `ansible-inventory` eliminates the need to maintain source-specific logic, relying on Ansible's code instead. This also enables consistent data structure output from `ansible-inventory`. There are many valid structures that a script can provide, but the output from `ansible-inventory` will always be the same, thus the AWX logic to parse the content is simplified. This is why even scripts must be ran through the `ansible-inventory` CLI.
Along with this switchover, a backported version of `ansible-inventory` was provided, which supports Ansible versions 2.2 and 2.3.
### Removal of Backport
In AWX 3.0.0 (and Tower 3.5), the backport of `ansible-inventory` was removed, and support for using custom virtual environments was added. This set the minimum version of Ansible necessary to run _any_ inventory update to 2.4.
## Inventory Plugin Versioning
Beginning in Ansible 2.5, inventory sources in Ansible started migrating away from "contrib" scripts (meaning they lived in the contrib folder) to the inventory plugin model.
Beginning in Ansible 2.5, inventory sources in Ansible started migrating away from `contrib` scripts (meaning they lived in the `contrib` folder) to the inventory plugin model.
In AWX 4.0.0 (and Tower 3.5) inventory source types start to switchover to plugins, provided that sufficient compatibility is in place for the version of Ansible present in the local virtualenv.
In AWX 4.0.0 (and Tower 3.5) inventory source types start to switch over to plugins, provided that sufficient compatibility is in place for the version of Ansible present in the local virtualenv.
To see what version the plugin transition will happen, see `awx/main/models/inventory.py` and look for the source name as a subclass of `PluginFileInjector`, and there should be an `initial_version` which is the first version that testing deemed to have sufficient parity in the content its inventory plugin returns. For example, `openstack` will begin using the inventory plugin in Ansible version 2.8. If you run an openstack inventory update with Ansible 2.7.x or lower, it will use the script.
### Sunsetting the scripts
To see in which version the plugin transition will happen, see `awx/main/models/inventory.py` and look for the source name as a subclass of `PluginFileInjector`, and there should be an `initial_version`, which is the first version that was deemed (via testing) to have sufficient parity in the content for its inventory plugin returns. For example, `openstack` will begin using the inventory plugin in Ansible version 2.8. If you run an OpenStack inventory update with Ansible 2.7.x or lower, it will use the script.
The eventual goal is for all source types to have moved to plugins. For any given source, after the `initial_version` for plugin use is higher than the lowest supported Ansible version, the script can be removed and the logic for script credential injection will also be removed.
For example, after AWX no longer supports Ansible 2.7, the script `awx/plugins/openstack_inventory.py` will be removed.
## Changes to Expect in Imports
An effort was made to keep imports working in the exact same way after the switchover. However, the inventory plugins are a fundamental rewrite and many elements of default behavior have changed. These changes also include many backward-incompatible changes. Because of this, what you get via an inventory import will be a superset of what you get from the script but will not match the default behavior you would get from the inventory plugin on the CLI.
Due to the fact that inventory plugins add additional variables, if you downgrade Ansible, you should turn on `overwrite` and `overwrite_vars` to get rid of stale variables (and potentially groups) no longer returned by the import.
### Changes for Compatibility
Programatically-generated examples of inventory file syntax used in updates (with dummy data) can be found in `awx/main/tests/data/inventory/scripts`, these demonstrate the inventory file syntax used to restore old behavior from the inventory scripts.
Programatically-generated examples of inventory file syntax used in updates (with dummy data) can be found in `awx/main/tests/data/inventory/scripts`. These demonstrate the inventory file syntax used to restore old behavior from the inventory scripts.
#### Hostvar Keys and Values
More hostvars will appear if the inventory plugins are used. To maintain backward compatibility, the old names are added back where they have the same meaning as a variable returned by the plugin. New names are not removed.
More `hostvars` will appear if the inventory plugins are used. To maintain backward compatibility, the old names are added back where they have the same meaning as a variable returned by the plugin. New names are not removed.
A small number of `hostvars` will be lost because of general deprecation needs.
A small number of hostvars will be lost because of general deprecation needs.
#### Host Names
In many cases, the host names will change. In all cases, accurate host tracking will still be maintained via the host `instance_id`. (after: https://github.com/ansible/awx/pull/3362)
In many cases, the host names will change. In all cases, accurate host tracking will still be maintained via the host `instance_id`.
## Writing Your Own Inventory File
If you do not want any of this compatibility-related functionality, then you can add an SCM inventory source that points to your own file. You can also apply a credential of a `managed_by_tower` type to that inventory source that matches the credential you are using, as long as it is not `gce` or `openstack`.
All other sources provide _secrets_ via environment variables. These can be re-used without any problems for SCM-based inventory, and your inventory file can be used securely to specify non-sensitive configuration details such as the `keyed_groups` (to provide) or hostvars (to construct).
All other sources provide _secrets_ via environment variables. These can be re-used without any problems for SCM-based inventory, and your inventory file can be used securely to specify non-sensitive configuration details such as the `keyed_groups` (to provide) or `hostvars` (to construct).
## Notes on Technical Implementation of Injectors
@ -72,6 +81,7 @@ The way this data is applied to the environment (including files and environment
With plugins, the inventory file may reference files that contain secrets from the credential. With scripts, typically an environment variable will reference a filename that contains a ConfigParser format file with parameters for the update, and possibly including fields from the credential.
**Caution:** Please do not put secrets from the credential into the inventory file for the plugin. Right now there appears to be no need to do this, and by using environment variables to specify secrets, this keeps open the possibility of showing the inventory file contents to the user as a latter enhancement.
Logic for setup for inventory updates using both plugins and scripts live in the inventory injector class, specific to the source type.

View File

@ -15,16 +15,16 @@ from `InventorySource` completely in Tower 3.3. As a result the related field on
Facts generated by an Ansible playbook during a Job Template run are stored by Tower into the database
whenever `use_fact_cache=True` is set per-Job-Template. New facts are merged with existing
facts and are per-host. These stored facts can be used to filter hosts via the
`/api/v2/hosts` endpoint, using the GET query parameter `host_filter` i.e.
`/api/v2/hosts` endpoint, using the GET query parameter `host_filter` *i.e.*,
`/api/v2/hosts?host_filter=ansible_facts__ansible_processor_vcpus=8`
The grammer of `host_filter` allows for:
The grammar of `host_filter` allows for:
* grouping via `()`
* the boolean `and` operator
* `__` to reference related fields in relational fields
* `__` is used on `ansible_facts` to separate keys in a JSON key path
* `[]` is used to denote a json array in the path specification
* `""` can be used in the value when spaces are wanted in the value
* `[]` is used to denote a JSON array in the path specification
* `""` can be used in the value when spaces are utilized
* "classic" Django queries may be embedded in the `host_filter`
Examples:
@ -40,16 +40,15 @@ Examples:
## Smart Inventory
Starting in Tower 3.2, Tower will support the ability to define a _Smart Inventory_.
You will define the inventories using the same language we currently support
in our _Smart Search_.
Users will define the inventories using the same language that is currently supported
in _Smart Search_.
### Inventory Changes
* The `Inventory` model has a new field called `kind`. The default of this field will be blank
for normal inventories and set to `smart` for smart inventories.
* `Inventory` model has a new field called `host_filter`. The default of this field will be blank
for normal inventories. When `host_filter` is set AND the inventory `kind` is set to `smart`
is the combination that makes a _Smart Inventory_.
for normal inventories. When `host_filter` is set AND the inventory `kind` is set to `smart`, this combination makes a _Smart Inventory_.
* `Host` model has a new field called `smart_inventories`. This field uses the `SmartInventoryMemberships`
lookup table to provide a set of all of the _Smart Inventory_ a host is a part of. The memberships
@ -58,12 +57,12 @@ are generated by the `update_host_smart_inventory_memberships` task. The task is
* Existing Host is changed (update/delete).
* New Smart Inventory is added.
* Existing Smart Inventory is changed (update/delete).
* NOTE: This task is only run if the `AWX_REBUILD_SMART_MEMBERSHIP` is set to True. It defaults to False.
* **NOTE:** This task is only run if the `AWX_REBUILD_SMART_MEMBERSHIP` is set to `True`. It defaults to `False`.
### Smart Filter (host_filter)
### Smart Filter (`host_filter`)
The `SmartFilter` class handles our translation of the smart search string. We store the
filter value in the `host_filter` field for an inventory. This value should be expressed
the same way we express our existing smart searches.
the same way for existing smart searches.
host_filter="search=foo"
host_filter="group__search=bar"
@ -82,12 +81,12 @@ Creating a new _Smart Inventory_ for all of our GCE and EC2 groups might look li
}
### More On Searching
The `host_filter` you set will search over the entirety of the hosts you have
access to in Tower. If you want to restrict your search in anyway, you will
want to declare that in your host filter.
The `host_filter` that is set will search over the entirety of the hosts the user has
access to in Tower. If the user wants to restrict their search in anyway, they will
want to declare that in their host filter.
For example, if you want to restrict the search to only hosts in an inventory
named "US-East", you would create a `host_filter` that looked something like this:
For example, if wanting to restrict the search to only hosts in an inventory
named "US-East", create a `host_filter` that looks something like this:
{
"name": "NYC Hosts",
@ -96,12 +95,13 @@ named "US-East", you would create a `host_filter` that looked something like thi
...
}
In the above example, you are limiting the search to the "US-East" inventory and
In the above example, the search is limited to the "US-East" inventory and
hosts with a name containing "nyc".
### Acceptance Critera
When verifying acceptance we should ensure the following statements are true:
When verifying acceptance, ensure the following statements are true:
``
* `Inventory` has a new field named `kind` that defaults to empty and
@ -116,23 +116,21 @@ search that is set in the `host_filter`.
* Not allow creation of Inventory Sources
### API Concerns
There are no breaking or backwards incompatible changes for this feature.
There are no breaking or backwards-incompatible changes for this feature.
## Other Changes
### Inventory update all inventory_sources
### Inventory Updates All `inventory_sources`
A new endpoint `/api/v2/inventories/:id/update_inventory_sources` has been added. This endpoint
functions in the same way that `/api/v2/inventory_source/:id/update` functions for a single
`InventorySource` with the exception that it updates all of the inventory sources for the
`Inventory`.
`HTTP GET /api/v2/inventories/:id/update_inventory_sources` will list all of the inventory
sources and if they will be updated when a POST to the same endpoint is made. The result of
sources and whether or not they will be updated when a POST to the same endpoint is made. The result of
this request will look like this:
> *Note:* All manual inventory sources (source='') will be ignored by the update_inventory_sources endpoint.
{
results: [
"inventory_source": 1, "can_update": True,
@ -140,7 +138,9 @@ this request will look like this:
]
}
When making a POST to the same endpoint, the response will contain a status as well as the job ID for the update.
> *Note:* All manual inventory sources (`source=''`) will be ignored by the `update_inventory_sources` endpoint.
When making a POST to the same endpoint, the response will contain a status as well as the job ID for the update:
POST /api/v2/inventories/:id/update_inventory_sources
@ -152,7 +152,7 @@ When making a POST to the same endpoint, the response will contain a status as w
}
Response code from this action will be:
The response code from this action will be:
- 200 if all inventory source updates were successful
- 202 if some inventory source updates were successful, but some failed
@ -160,8 +160,6 @@ Response code from this action will be:
- 400 if there are no inventory sources in the inventory
### Background deletion of Inventory
### Background Deletion of Inventory
If a DELETE request is submitted to an inventory, the field `pending_delete` will be True until a separate task fully completes the task of deleting the inventory and all its contents.
### InventorySource Hosts and Groups read-only

View File

@ -9,9 +9,9 @@ Fields that should be specified on creation of SCM inventory source:
- `source_project` - project to use
- `source_path` - relative path inside of the project indicating a
directory or a file, if left blank, "" is still a relative path
directory or a file; if left blank, `""` is still a relative path
indicating the root directory of the project
- the `source` field should be set to "scm"
- the `source` field should be set to `"scm"`
Additionally:
@ -30,27 +30,29 @@ in turn, trigger an update of the inventory source.
Also, with this flag set, an update _of the project_ is
scheduled immediately after creation of the inventory source.
Also, if this flag is set, no inventory updates will be triggered
_unless the scm revision of the project changes_.
_unless the SCM revision of the project changes_.
### RBAC
User needs `use` role to the project in order to use it as a source
project for inventory (this entails permission to run arbitrary scripts).
The user needs the `use` role for the project in order to utilize it as a source
project for the inventory (this entails permission to run arbitrary scripts).
To update the project, they need `update` permission to the project,
even if the update is done indirectly.
### Inventory File Suggestions
The project should show a listing of suggested inventory locations, at the
endpoint `/projects/N/inventories/`, but this is not a comprehensive list of
The project should show a listing of suggested inventory locations at the `/projects/N/inventories/` endpoint, but this is not a comprehensive list of
all paths that could be used as an Ansible inventory because of the wide
range of inclusion criteria. The list will also max out at 50 entries.
The user should be allowed to specify a location manually in the UI.
This listing should be refreshed to latest SCM info on a project update.
This listing should be refreshed to the latest SCM info on a project update.
If no inventory sources use a project as an SCM inventory source, then
the inventory listing may not be refreshed on update.
### Still-to-come 3.2 Changes
As a part of a different feature, it is planned to have all inventory sources
@ -58,28 +60,30 @@ inside of an inventory all update with a single button click. When this
happens for an inventory containing an SCM inventory source, it should
update the project.
### Inventory Source Restriction
Since automatic inventory updates (triggered by a project update) do not
go through the task system, typical protection against conflicting updates
is not available. To avoid problems, only 1 inventory source is allowed for
is not available. To avoid problems, only one inventory source is allowed for
inventories that use this feature. That means that if an inventory source
has `source=scm` and `update_on_project_update=true`, it can be the only
inventory source for its inventory.
## Supported File Syntax
> Any Inventory Ansible supports should be supported by this feature
> Any Inventory Ansible supports should be supported by this feature.
This is accomplished by making use of the `ansible-inventory` command.
the inventory import tower-manage command will check for the existence
The inventory import `tower-manage` command will check for the existence
of `ansible-inventory` and if it is not present, it will call a backported
version of it. The backport is maintained as its own GPL3 licensed
repository.
https://github.com/ansible/ansible-inventory-backport
Because the internal mechanism is different, we need some coverage
Because the internal mechanism is different, there needs to be some coverage
testing with Ansible versions pre-2.4 and after.
### Vars Restrictions
@ -106,14 +110,14 @@ will consistently utilize group-level variables.
Some test scenarios to look at:
- Test projects that use scripts
- Test projects that have multiple inventory files in a directory,
group_vars, host_vars, etc.
`group_vars`, `host_vars`, etc.
- Test scripts in the project repo
- Test scripts that use environment variables provided by a credential
in Tower
- Test multiple inventories that use the same project, pointing to different
files / directories inside of the project
- Feature works correctly even if project doesn't have any playbook files
- File related errors should surface as inventory import failures
- File-related errors should surface as inventory import failures
+ missing file
+ invalid syntax in file
- If the project SCM update encounters errors, it should not run the
@ -125,29 +129,28 @@ The API guide should summarize what is in the use details.
Once the UI implementation is done, the product docs should cover its
standard use.
## Update-on-launch
## Update-On-Launch
If the SCM inventory source is configured to follow the project updates,
the `update_on_launch` field can not to be set to True. This is because
of concerns related to the task manager job dependency tree.
We should document the alternatives for a user to accomplish the same thing
through in a different way.
Below are some alternatives methods which allow a user to accomplish the same thing in a different way:
### Alternative 1: Use same project for playbook
You can make a job template that uses a project as well as an inventory
that updates from that same project. In this case, you can set the project
A user can make a job template that uses a project as well as an inventory
that updates from that same project. In this case, they can set the project
to `update_on_launch`, in which case it will trigger an inventory update
if needed.
### Alternative 2: Use the project in a workflow
If you must use a different project for the playbook than for the inventory
source, then you can still place the project in a workflow and then have
If a user must utilize a different project for the playbook than for the inventory
source, then they can still place the project in a workflow and then have
a job template run on success of the project update.
This is guaranteed to have the inventory update "on time" (by this we mean
This is guaranteed to have the inventory update "on time" (this means
that the inventory changes are complete before the job template is launched),
because the project does not transition to the completed state
until the inventory update is finished.
@ -168,4 +171,3 @@ that contains the `source_project` of the inventory source.
If the inventory source is not configured to update on project update,
then it will inherit the allowed instance groups from its inventory,
like all other inventory syncs.

View File

@ -1,22 +1,22 @@
## Job Branch Override
Background: Projects specify the branch, tag, or reference to use from source control
_Background:_ Projects specify the branch, tag, or reference to use from source control
in the `scm_branch` field.
This "Branch Override" feature allows project admins to delegate branch selection to
admins of job templates that use that project (requiring only project
`use_role`). Admins of job templates can further
delegate that ability to users executing the job template
(requiring only job template `execute_role`) by enabling
`ask_scm_branch_on_launch` on the job template.
admins of Job Templates that use that project (requiring only project
`use_role`). Admins of Job Templates can further
delegate that ability to users executing the Job Template
(requiring only Job Template `execute_role`) by enabling
`ask_scm_branch_on_launch` on the Job Template.
### Source Tree Copy Behavior
Background: Every job run has its own private data directory.
_Background:_ Every job run has its own private data directory.
This folder is temporary, cleaned up at the end of the job run.
This directory contains a copy of the project source tree for the given
`scm_branch` the job is running.
`scm_branch` while the job is running.
A new shallow copy is made for every job run.
Jobs are free to make changes to the project folder and make use of those
@ -26,24 +26,25 @@ changes while it is still running.
With the introduction of this feature, the function of `scm_clean` is watered
down. It will still be possible to enable this function, and it will be
passed through as a parameter to the playbook as a tool for trouble shooting.
Two notable cases that lose support are documented here.
passed through as a parameter to the playbook as a tool for troubleshooting.
Two notable cases that lose support are documented below:
1) Setting `scm_clean` to `true` will no longer persist changes between job runs.
That means that jobs that rely on content which is not committed to source
This means that jobs that rely on content which is not committed to source
control may fail now.
2) Because it is a shallow copy, this folder will not contain the full
git history for git project types.
### Project Revision Concerns
Background of how normal project updates work:
_Background:_
The revision of the default branch (specified as `scm_branch` of the project)
is stored when updated, and jobs using that project will employ this revision.
Providing a non-default `scm_branch` in a job comes with some restrictions
Providing a non-default `scm_branch` in a job comes with some restrictions,
which are unlike the normal update behavior.
If `scm_branch` is a branch identifier (not a commit hash or tag), then
the newest revision is pulled from the source control remote immediately
@ -60,8 +61,9 @@ project default branch.
The `scm_branch` field is not validated, so the project must update
to assure it is valid.
If `scm_branch` is provided or prompted for, the `playbook` field of
job templates will not be validated, and users will have to launch
the job template in order to verify presence of the expected playbook.
Job Templates will not be validated, and users will have to launch
the Job Template in order to verify presence of the expected playbook.
### Git Refspec
@ -99,7 +101,7 @@ no matter what is used for `scm_refspec`.
The `scm_refspec` will affect which `scm_branch` fields can be used as overrides.
For example, you could set up a project that allows branch override with the
1st or 2nd refspec example, then use this in a job template
that prompts for `scm_branch`, then a client could launch the job template when
first or second refspec example, then use this in a Job Template
that prompts for `scm_branch`, then a client could launch the Job Template when
a new pull request is created, providing the branch `pull/N/head`,
then the job template would run against the provided github pull request reference.
then the Job Template would run against the provided GitHub pull request reference.

View File

@ -1,16 +1,22 @@
## Ansible Callback and Job Events
There is no concept of a job event in Ansible. Job Events are json structures, created when Ansible calls the Tower callback plugin hooks (i.e. `v2_playbook_on_task_start`, `v2_runner_on_ok`, etc.). The Job Event data structures contain data from the parameters of the callback hooks plus unique ids that reference other Job Events. There is usually a 1-1 relationship between a Job Event and an Ansible callback plugin function call.
There is no concept of a job event in Ansible. Job Events are JSON structures, created when Ansible calls the Tower callback plugin hooks (*i.e.*, `v2_playbook_on_task_start`, `v2_runner_on_ok`, etc.). The Job Event data structures contain data from the parameters of the callback hooks plus unique IDs that reference other Job Events. There is usually a one-to-one relationship between a Job Event and an Ansible callback plugin function call.
## Job Event Relationships
The Job Event relationship is strictly hierarchical. In the example details below, each Job Event bullet is related to the previous Job Event to form a hierarchy.
* There is always 1 and only 1 `v2_playbook_on_start` event and it is the first event.
* `v2_playbook_on_play_start` is generated once per-play in the playbook. 2 such events would be generated from the playbook example below.
* The `v2_playbook_on_task_start` function is called once for each task under the default execution strategy. Other execution strategies (i.e. free or serial) can result in the `v2_playbook_on_task_start` function being multiple times, one for each host. Tower only creates a Job Event for the **first** `v2_playbook_on_task_start` call. Subsequent calls for the same task do **not** result in Job Events being created.
* `v2_runner_on_[ok, failed, skipped, unreachable, retry, item_on_ok, item_on_failed, item_on_skipped]` One `v2_runner_on_...` Job Event will be created for each `v2_playbook_on_task_start` event.
The Job Event relationship is strictly hierarchical. In the example details below, each Job Event bullet is related to the previous Job Event to form a hierarchy:
* There is always one and only one `v2_playbook_on_start` event and it is the first event.
* `v2_playbook_on_play_start` is generated once per-play in the playbook; two such events would be generated from the playbook example below.
* The `v2_playbook_on_task_start` function is called once for each task under the default execution strategy. Other execution strategies (*i.e.*, free or serial) can result in the `v2_playbook_on_task_start` function being called multiple times, one for each host. Tower only creates a Job Event for the **first** `v2_playbook_on_task_start` call. Subsequent calls for the same task do **not** result in Job Events being created.
* `v2_runner_on_[ok, failed, skipped, unreachable, retry, item_on_ok, item_on_failed, item_on_skipped]`; one `v2_runner_on_...` Job Event will be created for each `v2_playbook_on_task_start` event.
## Example
Below is an example inventory and playbook outline along with an example of Job Events generated and their hierarchical relationship.
Below is an example inventory and playbook outline, along with the Job Events generated and their hierarchical relationship:
```
# inventory
[tower]
@ -45,7 +51,8 @@ hostC
when: inventory_hostname == 'C'
```
Below is a visualization of how Job Events are related to form a hierarchy given a run of the playbook above.
Below is a visualization of how Job Events are related to form a hierarchy given a run of the playbook above:
```
`-- playbook_on_start
|-- playbook_on_play_start-preflight
@ -69,8 +76,12 @@ Below is a visualization of how Job Events are related to form a hierarchy given
`-- runner_on_ok_hostC
```
## Job Event Creation Patterns
Ansible execution strategy heavily influences the creation order of Job Events. The above examples of Job Events creation an hierarchy are also the order in which they are created when the Ansible default execution strategy is used. When other strategies like free and serial are used, the order in which Job Events are created is slightly different. Let's take the previous example playbook and Job Events and show the order in which the Job Events may be created when the free strategy is used. Notice how `runner_on_*` Job Events can be created **after** a `playbook_on_task_start` for the next task runs. This is not the case for the default Ansible execution strategy. Under the default Ansible execution strategy, all `runner_on_*` Job Events will be created before the next `playbook_on_task_start` is generated.
The Ansible execution strategy heavily influences the creation order of Job Events. The above examples of Job Events creation and hierarchy are also the order in which they are created when the Ansible default execution strategy is used. When other strategies like `free` and `serial` are used, the order in which Job Events are created is slightly different.
Let's take the previous example playbook and Job Events and show the order in which the Job Events may be created when the free strategy is used. Notice how `runner_on_*` Job Events can be created **after** a `playbook_on_task_start` for the next task runs. This is not the case for the default Ansible execution strategy. Under the default Ansible execution strategy, all `runner_on_*` Job Events will be created before the next `playbook_on_task_start` is generated:
```
playbook_on_start
@ -95,9 +106,13 @@ playbook_on_play_start-install
runner_on_ok_hostA (install_tower)
```
## Testing
A management command for event replay exists for replaying jobs at varying speeds and other parameters. Run `awx-manage replay_job_events --help` for additional usage information. To prepare the UI for event replay, load the page for a finished job and then append `_debug` as a parameter to the url.
## Code References
* More comprehensive list of Job Events and the hierarchy they form https://github.com/ansible/awx/blob/devel/awx/main/models/jobs.py#L870
* Exhaustive list of Job Events in Tower https://github.com/ansible/awx/blob/devel/awx/main/models/jobs.py#L900
* For a more comprehensive list of Job Events and the hierarchy they form, go here: https://github.com/ansible/awx/blob/devel/awx/main/models/jobs.py#L870
* Exhaustive list of Job Events in Tower: https://github.com/ansible/awx/blob/devel/awx/main/models/jobs.py#L900

View File

@ -1,13 +1,15 @@
# Job Slicing Overview
Ansible, by default, runs jobs from a single control instance. At best a single Ansible job can be sliced up on a single system via forks but this doesn't fully take advantage of AWX's ability to distribute work to multiple nodes in a cluster.
Ansible, by default, runs jobs from a single control instance. At best, a single Ansible job can be sliced up on a single system via forks but this doesn't fully take advantage of AWX's ability to distribute work to multiple nodes in a cluster.
Job Slicing solves this by adding a Job Template field `job_slice_count`. This field specifies the number of **Jobs** to slice the Ansible run into. When this number is greater than 1 ``AWX`` will generate a **Workflow** from a **JobTemplate** instead of a **Job**. The **Inventory** will be distributed evenly amongst the slice jobs. The workflow job is then started and proceeds as though it were a normal workflow. The API will return either a **Job** resource (if `job_slice_count` < 2) or a **WorkflowJob** resource otherwise. Likewise, the UI will redirect to the appropriate screen to display the status of the run.
Job Slicing solves this problem by adding a Job Template field `job_slice_count`. This field specifies the number of **Jobs** to slice the Ansible run into. When this number is greater than one, `AWX` will generate a **Workflow** from a **Job Template** instead of a **Job**. The **Inventory** will be distributed evenly amongst the sliced jobs. The workflow job is then started and proceeds as though it were a normal workflow. The API will return either a **Job** resource (if `job_slice_count` < 2) or a **WorkflowJob** resource otherwise. Likewise, the UI will redirect to the appropriate screen to display the status of the run.
## Implications for Job execution
When jobs are sliced they can run on any Tower node and some may not run at the same time. Because of this, anything that relies on setting/sliced state (using modules such as ``set_fact``) will not work as expected. It's reasonable to expect that not all jobs will actually run at the same time (if there is not enough capacity in the system for example)
## Implications for Job Execution
When jobs are sliced, they can run on any Tower node; however, some may not run at the same time. Because of this, anything that relies on setting/sliced state (using modules such as `set_fact`) will not work as expected. It's reasonable to expect that not all jobs will actually run at the same time (*e.g.*, if there is not enough capacity in the system)
## Simultaneous Execution Behavior
By default Job Templates aren't normally configured to execute simultaneously (``allow_simultaneous`` must be checked). Slicing overrides this behavior and implies ``allow_simultaneous`` even if that setting is unchecked.
By default, Job Templates aren't normally configured to execute simultaneously (`allow_simultaneous` must be checked). Slicing overrides this behavior and implies `allow_simultaneous`, even if that setting is not selected.

View File

@ -1,35 +1,32 @@
# Integration with Third-Party Log Aggregators
# Integration With Third-Party Log Aggregators
This feature builds in the capability to send detailed logs to several kinds
of 3rd party external log aggregation services. Services connected to this
of third party external log aggregation services. Services connected to this
data feed should be useful in order to gain insights into Tower usage
or technical trends. The data is intended to be
sent in JSON format via three ways: over a HTTP connection, a direct TCP
connection or a direct UDP connection. It uses minimal service-specific
sent in JSON format via three ways: over an HTTP connection, a direct TCP
connection, or a direct UDP connection. It uses minimal service-specific
tweaks engineered in a custom handler or via an imported library.
## Loggers
This features introduces several new loggers which are intended to
deliver a large amount of information in a predictable structured format,
This feature introduces several new loggers which are intended to
deliver a large amount of information in a predictable and structured format,
following the same structure as one would expect if obtaining the data
from the API. These data loggers are the following.
from the API. These data loggers are the following:
- awx.analytics.job_events
- Data returned from the Ansible callback module
- awx.analytics.activity_stream
- Record of changes to the objects within the Ansible Tower app
- awx.analytics.system_tracking
- Data gathered by Ansible scan modules ran by scan job templates
- `awx.analytics.job_events` - Data returned from the Ansible callback module
- `awx.analytics.activity_stream` - Record of changes to the objects within the Ansible Tower app
- `awx.analytics.system_tracking` - Data gathered by Ansible scan modules ran by scan job templates
These loggers only use log-level of INFO.
Additionally, the standard Tower logs are be deliverable through this
same mechanism. It should be obvious to the user how to enable to disable
each of these 5 sources of data without manipulating a complex dictionary
in their local settings file, as well as adjust the log-level consumed
These loggers only use log-level of `INFO`. Additionally, the standard Tower logs are deliverable through this
same mechanism. It should be obvious to the user how to enable or disable
each of these five sources of data without manipulating a complex dictionary
in their local settings file, as well as adjust the log level consumed
from the standard Tower logs.
## Supported Services
Committed to support:
@ -45,18 +42,19 @@ Have tested:
Considered, but have not tested:
- Datadog
- Red Hat Common Logging via logstash connector
- Red Hat Common Logging via Logstash connector
### Elastic Search Instructions
In the development environment, the server can be started up with the
log aggregation services attached via the Makefile targets. This starts
up the 3 associated services of Logstash, Elastic Search, and Kibana
as their own separate containers individually.
up the three associated services of Logstash, Elastic Search, and Kibana
as their own separate containers.
In addition to running these services, it establishes connections to the
tower_tools containers as needed. This is derived from the docker-elk
project. (https://github.com/deviantony/docker-elk)
`tower_tools` containers as needed. This is derived from the [`docker-elk`
project](https://github.com/deviantony/docker-elk):
```bash
# Start a single server with links
@ -65,12 +63,12 @@ make docker-compose-elk
make docker-compose-cluster-elk
```
For more instructions on getting started with the environment this stands
up, also refer to instructions in `/tools/elastic/README.md`.
For more instructions on getting started with the environment that this example spins
up, also refer to instructions in [`/tools/elastic/README.md`](https://github.com/ansible/awx/blob/devel/tools/elastic/README.md).
If you were to start from scratch, standing up your own version the elastic
stack, then the only change you should need is to add the following lines
to the logstash `logstash.conf` file.
If you were to start from scratch, standing up your own version of the Elastic
Stack, then the only change you should need is to add the following lines
to the Logstash `logstash.conf` file:
```
filter {
@ -80,72 +78,77 @@ filter {
}
```
#### Debugging and Pitfalls
Backward-incompatible changes were introduced with Elastic 5.0.0, and
customers may need different configurations depending on what
versions they are using.
# Log Message Schema
Common schema for all loggers:
| Field | Information |
|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cluster_host_id | (string) unique identifier of the host within the Tower cluster |
| level | (choice of DEBUG, INFO, WARNING, ERROR, etc.) Standard python log level, roughly reflecting the significance of the event All of the data loggers as a part of this feature use INFO level, but the other Tower logs will use different levels as appropriate |
| logger_name | (string) Name of the logger we use in the settings, for example, "awx.analytics.activity_stream" |
| @timestamp | (datetime) Time of log |
| path | (string) File path in code where the log was generated |
| `cluster_host_id` | (string) Unique identifier of the host within the Tower cluster |
| `level` | (choice of `DEBUG`, `INFO`, `WARNING`, `ERROR`, etc.) Standard python log level, roughly reflecting the significance of the event; all of the data loggers (as a part of this feature) use `INFO` level, but the other Tower logs will use different levels as appropriate |
| `logger_name` | (string) Name of the logger we use in the settings, *e.g.*, "`awx.analytics.activity_stream`" |
| `@timestamp` | (datetime) Time of log |
| `path` | (string) File path in code where the log was generated |
## Activity Stream Schema
| Field | Information |
|-------------------|-------------------------------------------------------------------------------------------------------------------------|
| (common) | this uses all the fields common to all loggers listed above |
| actor | (string) username of the user who took the action documented in the log |
| changes | (string) unique identifier of the host within the Tower cluster |
| operation | (choice of several options) the basic category of the changed logged in the activity stream, for instance, "associate". |
| object1 | (string) Information about the primary object being operated on, consistent with what we show in the activity stream |
| object2 | (string) if applicable, the second object involved in the action |
| (common) | This uses all the fields common to all loggers listed above |
| actor | (string) Username of the user who took the action documented in the log |
| changes | (string) Unique identifier of the host within the Tower cluster |
| operation | (choice of several options) The basic category of the change logged in the Activity Stream, for instance, "associate". |
| object1 | (string) Information about the primary object being operated on, consistent with what we show in the Activity Stream |
| object2 | (string) If applicable, the second object involved in the action |
## Job Event Schema
This logger echoes the data being saved into job events, except when they
would otherwise conflict with expected standard fields from the logger,
in which case the fields are named differently.
Notably, the field `host` on the job_event model is given as `event_host`.
This logger echoes the data being saved into Job Events, except when they
would otherwise conflict with expected standard fields from the logger (in which case the fields are named differently).
Notably, the field `host` on the `job_event` model is given as `event_host`.
There is also a sub-dictionary field `event_data` within the payload,
which will contain different fields depending on the specifics of the
Ansible event.
This logger also includes the common fields.
## Scan / Fact / System Tracking Data Schema
These contain a detailed dictionary-type field either services,
These contain a detailed dictionary-type field for either services,
packages, or files.
| Field | Information |
|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| (common) | this uses all the fields common to all loggers listed above |
| services | (dict, optional) For services scans, this field is included and has keys based on the name of the service NOTE: Periods are disallowed by elastic search in names, and are replaced with "_" by our log formatter |
| (common) | This uses all the fields common to all loggers listed above |
| services | (dict, optional) For services scans, this field is included and has keys based on the name of the service. **NOTE:** Periods are disallowed by Elastic Search in names, and are replaced with `"_"` by our log formatter |
| packages | (dict, optional) Included for log messages from package scans |
| files | (dict, optional) Included for log messages from file scans |
| host | (str) name of host scan applies to |
| inventory_id | (int) inventory id host is inside of
| host | (str) Name of host that the scan applies to |
| inventory_id | (int) Inventory ID host is inside of
## Job Status Changes
This is a intended to be a lower-volume source of information about
changes in job states compared to job events, and also intended to
capture changes to types of unified jobs other than job template based
changes in job states compared to Job Events, and also intended to
capture changes to types of Unified Jobs other than Job Template-based
jobs.
In addition to common fields, these logs include fields present on
the job model.
## Tower Logs
In addition to the common fields, this will contain a `msg` field with
@ -153,6 +156,7 @@ the log message. Errors contain a separate `traceback` field.
These logs can be enabled or disabled in CTiT by adding or removing
it to the setting `LOG_AGGREGATOR_LOGGERS`.
# Configuring Inside of Tower
Parameters needed in order to configure the connection to the log
@ -167,8 +171,8 @@ supported services:
- A flag to indicate how system tracking records will be sent
- Selecting which loggers to send
- Enabling sending logs
- Connection type, which can be HTTPS, TCP and UDP.
- Timeout value if connection type is based on TCP protocol (HTTPS and TCP).
- Connection type (HTTPS, TCP, or UDP)
- Timeout value if connection type is based on TCP protocol (HTTPS and TCP)
Some settings for the log handler will not be exposed to the user via
this mechanism. For example, threading (enabled).
@ -186,28 +190,29 @@ connection, Port field is supposed to be provided and Host field is supposed to
contain hostname only. If instead a URL is entered in Host field, its hostname
portion will be extracted as the actual hostname.
# Acceptance Criteria Notes
Connection: Testers need to replicate the documented steps for setting up
**Connection:** Testers need to replicate the documented steps for setting up
and connecting with a destination log aggregation service, if that is
an officially supported service. That will involve 1) configuring the
settings, as documented, 2) taking some action in Tower that causes a log
message from each type of data logger to be sent and 3) verifying that
the content is present in the log aggregation service.
Schema: After the connection steps are completed, a tester will need to create
**Schema:** After the connection steps are completed, a tester will need to create
an index. We need to confirm that no errors are thrown in this process.
It also needs to be confirmed that the schema is consistent with the
documentation. In the case of Splunk, we need basic confirmation that
the data is compatible with the existing app schema.
Tower logs: Formatting of Traceback message is a known issue in several
**Tower logs:** Formatting of Traceback message is a known issue in several
open-source log handlers, so we should confirm that server errors result
in the log aggregator receiving a well-formatted multi-line string
with the traceback message.
Log messages should be sent outside of the
request-response cycle. For example, loggly examples use
request-response cycle. For example, Loggly examples use
`requests_futures.sessions.FuturesSession`, which does some
threading work to fire the message without interfering with other
operations. A timeout on the part of the log aggregation service should

View File

@ -1,8 +1,11 @@
Starting from API V2, the named URL feature lets user access Tower resources via resource-specific human-readable identifiers. Before the only way of accessing a resource object without auxiliary query string is via resource primary key number, for example, via URL path `/api/v2/hosts/2/`. Now users can use named URL to do the same thing, for example, via URL path `/api/v2/hosts/host_name++inv_name++org_name/`.
Starting from API V2, the Named URL feature lets users access Tower resources via resource-specific human-readable identifiers. Previously, the only way of accessing a resource object without auxiliary query string was via resource primary key number(*e.g.*, via URL path `/api/v2/hosts/2/`). Now users can use named URL to do the same thing, for example, via URL path `/api/v2/hosts/host_name++inv_name++org_name/`.
## Usage
There are two named-URL-related Tower configuration setting available under `/api/v2/settings/named-url/`: `NAMED_URL_FORMATS` and `NAMED_URL_GRAPH_NODES`. `NAMED_URL_FORMATS` is a *read only* key-value pair list of all available named URL identifier formats. A typical `NAMED_URL_FORMATS` looks like this:
There are two named-URL-related Tower configuration settings available under `/api/v2/settings/named-url/`: `NAMED_URL_FORMATS` and `NAMED_URL_GRAPH_NODES`.
`NAMED_URL_FORMATS` is a *read only* key-value pair list of all available named URL identifier formats. A typical `NAMED_URL_FORMATS` looks like this:
```
"NAMED_URL_FORMATS": {
"job_templates": "<name>",
@ -27,49 +30,64 @@ There are two named-URL-related Tower configuration setting available under `/ap
```
For each item in `NAMED_URL_FORMATS`, the key is the API name of the resource to have named URL, the value is a string indicating how to form a human-readable unique identifiers for that resource. A typical procedure of composing named URL for a specific resource object using `NAMED_URL_FORMATS` is given below:
Suppose that a user wants to manually determine the named URL for a label with `id` 5. She should first look up `labels` field of `NAMED_URL_FORMATS` and get the identifier format `<name>++<organization.name>`. The first part of the URL format is `<name>`, that indicates she should get the label resource detail, `/api/v2/labels/5/`, and look for `name` field in returned JSON. Suppose the user has `name` field with value 'Foo', then the first part of our unique identifier is `Foo`; The second part of the format are double pluses `++`. That is the delimiter that separate different parts of a unique identifier so simply append them to unique identifier to get `Foo++`; The third part of the format is `<organization.name>`, This indicates that field is not in the current label object under investigation, but in an organization which the label object points to. Thus, as the format indicates, the user should look up `organization` in `related` field of current returned JSON. That field may or may not exist, if it exists, follow the URL given in that field, say `/api/v2/organizations/3/`, to get the detail of the specific organization, extract its `name` field, say 'Default', and append it to our current unique identifier. Since `<organizations.name>` is the last part of format, we end up generating unique identifier for underlying label and have our named URL ready: `/api/v2/labels/Foo++Default/`. In the case where `organization` does not exist in `related` field of label object detail, we append empty string `''` instead, which essentially does not alter the current identifier. So `Foo++` becomes final unique identifier and thus generate named URL to be `/api/v2/labels/Foo++/`.
Suppose that a user wants to manually determine the named URL for a label with `id` `5`. She should first look up `labels` field of `NAMED_URL_FORMATS` and get the identifier format `<name>++<organization.name>`. The first part of the URL format is `<name>`, which indicates that she should get the label resource detail, `/api/v2/labels/5/`, and look for the `name` field in returned JSON.
An important aspect of generating unique identifier for named URL is dealing with reserved characters. Because the identifier is part of a URL, the following reserved characters by URL standard should be escaped to its percentage encoding: `;/?:@=&[]`. For example, if an organization is named `;/?:@=&[]`, its unique identifier should be `%3B%2F%3F%3A%40%3D%26%5B%5D`. Another special reserved character is `+`, which is not reserved by URL standard but used by named URL to link different parts of an identifier. It is escaped by `[+]`. For example, if an organization is named `[+]`, tis unique identifier is `%5B[+]%5D`, where original `[` and `]` are percent encoded and `+` is converted to `[+]`.
Suppose the user has `name` field with value `'Foo'`; then the first part of our unique identifier is `Foo`. The second part of the format are double pluses `++`. That is the delimiter that separates different parts of a unique identifier, so simply append them to the unique identifier to get `Foo++`.
`NAMED_URL_FORMATS` exclusively lists every resource that can have named URL, any resource not listed there has no named URL. `NAMED_URL_FORMATS` alone should be instructive enough for users to compose human-readable unique identifier and named URL themselves. For more convenience, every object of a resource that can have named URL will have a related field `named_url` that displays that object's named URL. Users can simply copy-paste that field for their custom usages. Also, users are expected to see indications in help text of API browser if a resource object has named URL.
The third part of the format is `<organization.name>`, which indicates that field is not in the current label object under investigation, but in an organization which the label object points to. Thus, as the format indicates, the user should look up `organization` in `related` field of current returned JSON. That field may or may not exist; if it exists, follow the URL given in that field, say `/api/v2/organizations/3/`, to get the detail of the specific organization, extract its `name` field (*e.g.*, `'Default'`), and append it to our current unique identifier. Since `<organizations.name>` is the last part of format, we end up generating unique identifier for underlying label and have our named URL ready: `/api/v2/labels/Foo++Default/`.
Although `NAMED_URL_FORMATS` is immutable on user side, it will be automatically modified and expanded over time, reflecting underlying resource modification and expansion. Please consult `NAMED_URL_FORMATS` on the same Tower cluster where you want to use named url feature against.
In the case where `organization` does not exist in the `related` field of label object detail, we append empty string `''` instead, which essentially does not alter the current identifier. So `Foo++` becomes final unique identifier and thus generate named URL to be `/api/v2/labels/Foo++/`.
An important aspect of generating unique identifiers for named URL is dealing with reserved characters. Because the identifier is part of a URL, the following reserved characters by URL standard should be escaped to its percentage encoding: `;/?:@=&[]`. For example, if an organization is named `;/?:@=&[]`, its unique identifier should be `%3B%2F%3F%3A%40%3D%26%5B%5D`. Another special reserved character is `+`, which is not reserved by URL standard but used by named URL to link different parts of an identifier. It is escaped by `[+]`. For example, if an organization is named `[+]`, tis unique identifier is `%5B[+]%5D`, where original `[` and `]` are percent encoded and `+` is converted to `[+]`.
`NAMED_URL_FORMATS` exclusively lists every resource that can have named URL; any resource not listed there has no named URL. `NAMED_URL_FORMATS` alone should be instructive enough for users to compose human-readable unique identifier and named URL themselves. For more convenience, every object of a resource that can have named URL will have a related field `named_url` that displays that object's named URL. Users can simply copy-paste that field for their custom usages. Also, users are expected to see indications in the help text of the API browser if a resource object has named URL.
Although `NAMED_URL_FORMATS` is immutable on the user side, it will be automatically modified and expanded over time, reflecting underlying resource modification and expansion. Please consult `NAMED_URL_FORMATS` on the same Tower cluster where you want to use the named URL feature against.
`NAMED_URL_GRAPH_NODES` is another *read-only* list of key-value pairs that exposes the internal graph data structure that Tower uses to manage named URLs. This is not supposed to be human-readable but should be used for programmatically generating named URLs. An example script of generating a named URL given the primary key of arbitrary resource objects that can have named URL (using info provided by `NAMED_URL_GRAPH_NODES`) can be found as `/tools/scripts/pk_to_named_url.py`.
`NAMED_URL_GRAPH_NODES` is another *read-only* list of key-value pairs that exposes the internal graph data structure Tower used to manage named URLs. This is not supposed to be human-readable but should be used for programmatically generating named URLs. An example script of generating named URL given the primary key of arbitrary resource objects that can have named URL, using info provided by `NAMED_URL_GRAPH_NODES`, can be found as `/tools/scripts/pk_to_named_url.py`.
## Identifier Format Protocol
Resources in Tower are identifiable by their unique keys, which are basically tuples of resource fields. Every Tower resource is guaranteed to have its primary key number alone as a unique key, but there might be multiple other unique keys. A resource can generate identifier format thus have named URL if it contains at least one unique key that satisfies rules below:
Resources in Tower are identifiable by their unique keys, which are basically tuples of resource fields. Every Tower resource is guaranteed to have its primary key number alone as a unique key, but there might be multiple other unique keys.
A resource can generate identifier formats and thus have named URL if it contains at least one unique key that satisfies rules below:
1. The key *contains and only contains* fields that are either the `name` field, or text fields with a finite number of possible choices (like credential type resource's `kind` field).
2. The only allowed exceptional fields that breaks rule 1 is a many-to-one related field relating to a resource *other than self* which is also allowed to have a slug.
2. The only allowed exceptional fields that breaks the first rule is a many-to-one related field relating to a resource *other than self* which is also allowed to have a slug.
Here is an example for understanding the rules: Suppose Tower has resources `Foo` and `Bar`, both `Foo` and `Bar` contain a `name` field and a `choice` field that can only have value 'yes' or 'no'. Additionally, resource `Foo` contains a many-to-one field (a foreign key) relating to `Bar`, say `fk`. `Foo` has a unique key tuple `(name, choice, fk)` and `Bar` has a unique key tuple `(name, choice)`. Apparently `Bar` can have named URL because it satisfies rule 1. On the other hand, `Foo` can also have named URL, because although `Foo` breaks rule 1, the extra field breaking rule 1 is `fk` field, which is many-to-one-related to `Bar` and `Bar` can have named URL.
Here is an example for understanding the rules: Suppose Tower has resources `Foo` and `Bar`; both `Foo` and `Bar` contain a `name` field and a `choice` field that can only have value `'yes'` or `'no'`. Additionally, resource `Foo` contains a many-to-one field (a foreign key) relating to `Bar`, say `fk`. `Foo` has a unique key tuple `(name, choice, fk)` and `Bar` has a unique key tuple `(name, choice)`. Apparently `Bar` can have named URL because it satisfies rule 1. On the other hand, `Foo` can also have named URL, because although `Foo` breaks rule 1, the extra field breaking rule 1 is a `fk` field, which is many-to-one-related to `Bar` and `Bar` can have named URL.
For resources satisfying rule 1 above, their human-readable unique identifiers are combinations of foreign key fields, delimited by `+`. In specific, resource `Bar` above will have slug format `<name>+<choice>`. Note the field order matters in slug format: `name` field always comes the first if present, following by all the rest fields arranged in lexicographic order of field name. For example, if `Bar` also has an `a_choice` field satisfying rule 1 and the unique key becomes `(name, choice, a_choice)`, its slug format becomes `<name>+<a_choice>+<choice>`.
For resources satisfying rule 1 above, their human-readable unique identifiers are combinations of foreign key fields, delimited by `+`. Specifically, resource `Bar` above will have the slug format `<name>+<choice>`. Note the field order matters in slug format: `name` field always comes first if present, followed by all the rest of the fields arranged in lexicographic order of field name. For example, if `Bar` also has an `a_choice` field satisfying rule 1 and the unique key becomes `(name, choice, a_choice)`, its slug format becomes `<name>+<a_choice>+<choice>`.
For resources satisfying rule 2 above instead, if we trace back via the extra foreign key fields, we end up getting a tree of resources that all together identify objects of that resource. In order to generate identifier format, each resource in the traceback tree generates its own part of standalone format in the way described in the last paragraph, using all fields but the foreign keys. Finally all parts are combined by `++` in the following order:
* Put stand-alone format as the first identifier component.
* Recursively generate unique identifiers for each resource the underlying resource is pointing to using foreign key (a child of a traceback tree node).
For resources satisfying rule 2 above instead, if we trace back via the extra foreign key fields, we end up getting a tree of resources that altogether identify objects of that resource. In order to generate identifier format, each resource in the traceback tree generates its own part of standalone format in the way described in the last paragraph, using all fields but the foreign keys. Finally all parts are combined by `++` in the following order:
* Put standalone format as the first identifier component.
* Recursively generate unique identifiers for each resource the underlying resource is pointing to by using a foreign key (a child of a traceback tree node).
* Treat generated unique identifiers as the rest identifier components. Sort them in lexicographic order of corresponding foreign key.
* Combine all components together using `++` to generate the final identifier format.
Back to the example above, when generating identifier format for resource `Foo`, we firstly generate stand-alone formats, `<name>+<choice>` for `Foo` and `<fk.name>+<fk.choice>` for `Bar`, then combine them together to be `<name>+<choice>++<fk.name>+<fk.choice>`.
Back to the example above, when generating identifier format for resource `Foo`, we firstly generate standalone formats, `<name>+<choice>` for `Foo` and `<fk.name>+<fk.choice>` for `Bar`, then combine them together to be `<name>+<choice>++<fk.name>+<fk.choice>`.
When generating identifiers according to the given identifier format, there are cases where a foreign key may points to nowhere. In this case we substitute the part of the format corresponding to the resource the foreign key should point to with an empty string `''`. For example, if a `Foo` object has `name` to be 'alice', `choice` to be 'yes', but `fk` field `None`, its identifier will look like `alice+yes++`.
When generating identifiers according to the given identifier format, there are cases where a foreign key might point nowhere. In this case, we substitute the part of the format corresponding to the resource the foreign key should point to with an empty string `''`. For example, if a `Foo` object has `name` to be `'alice'`, `choice` to be `'yes'`, but `fk` field `None`, its identifier will look like `alice+yes++`.
## Implementation Overview
Module `awx.main.utils.named_url_graph` stands at the core of named URL implementation. It exposes a single public function `generate_graph`. `generate_graph` accepts a list of Tower models in tower that might have named URL (meaning having corresponding endpoint under `/api/v2/`), filter out those that turns out not able to have named URL, and connect the rest together into a named URL graph. The graph is available as a settings option `NAMED_URL_GRAPH` and each node of it contains all info needed to generate named URL identifier formats and parse incoming named URL identifiers.
`generate_graph` will run only once for each Tower WSGI process. This is guaranteed by putting the function call inside `__init__` of `URLModificationMiddleware`. When an incoming request enters `URLModificationMiddleware`, the part of its URL path that could contain a valid named URL identifier is extracted and processed to find (possible) corresponding resource object. The internal process is basically crawling against part of the named URL graph. If the object is found, the identifier part of the URL path is converted to the object's primary key. From now forward Tower can treat the request with the old-styled URL.
Module `awx.main.utils.named_url_graph` stands at the core of named URL implementation. It exposes a single public function, `generate_graph`. `generate_graph` accepts a list of Tower models in Tower that might have named URL (meaning they have corresponding endpoints under `/api/v2/`), filter out those that are unable to have named URLs, and connect the rest together into a named URL graph. The graph is available as a settings option, `NAMED_URL_GRAPH`, and each node of it contains all info needed to generate named URL identifier formats and parse incoming named URL identifiers.
`generate_graph` will run only once for each Tower WSGI process. This is guaranteed by putting the function call inside `__init__` of `URLModificationMiddleware`. When an incoming request enters `URLModificationMiddleware`, the part of its URL path that could contain a valid named URL identifier is extracted and processed to find (possible) corresponding resource objects. The internal process is basically crawling against part of the named URL graph. If the object is found, the identifier part of the URL path is converted to the object's primary key. Going forward, Tower can treat the request with the old-styled URL.
## Acceptance Criteria
In general, acceptance should follow what's in 'Usage' section. The contents in 'Identifier Format Protocol' section should not be relevant.
In general, acceptance should follow what's in the "Usage" section. The contents in the "Identifier Format Protocol" section should not be relevant.
* The classical way of getting objects via primary keys should behave the same.
* Tower configuration part of named URL should work as described. Particularly, `NAMED_URL_FORMATS` should be immutable on user side and display accurate named URL identifier format info.
* Tower configuration for named URL should work as described. Particularly, `NAMED_URL_FORMATS` should be immutable on the user's side and display accurately-named URL identifier format info.
* `NAMED_URL_FORMATS` should be exclusive, meaning resources specified in `NAMED_URL_FORMATS` should have named URL, and resources not specified there should *not* have named URL.
* If a resource can have named URL, its objects should have a `named_url` field which represents the object-specific named URL. That field should only be visible under detail view, not list view.
* A user following the rules specified in `NAMED_URL_FORMATS` should be able to generate named URL exactly the same as the `named_url` field.
* A user should be able to access specified resource objects via accurately generated named URL. This includes not only the object itself but also its related URLs, like if `/api/v2/res_name/obj_slug/` is valid, `/api/v2/res_name/obj_slug/related_res_name/` should also be valid.
* A user should not be able to access specified resource objects if the given named URL is inaccurate. For example, reserved characters not correctly escaped, or components whose corresponding foreign key field pointing nowhere is not replaced by empty string.
* A user should be able to access specified resource objects via an accurately-generated named URL. This includes not only the object itself but also its related URLs; for example, if `/api/v2/res_name/obj_slug/` is valid, then `/api/v2/res_name/obj_slug/related_res_name/` should also be valid.
* A user should not be able to access specified resource objects if the given named URL is inaccurate. For example, reserved characters not correctly escaped, or components whose corresponding foreign key field points nowhere but is not replaced by an empty string.
* A user should be able to dynamically generate named URLs by utilizing `NAMED_URL_GRAPH_NODES`.

View File

@ -9,7 +9,8 @@ A Notification Template is an instance of a notification type (Email, Slack, Web
At a high level, the typical notification task flow is:
* User creates a `NotificationTemplate` at `/api/v2/notification_templates/`.
* User assigns the notification to any of the various objects that support it (all variants of Job Templates as well as organizations and projects) and at the appropriate trigger level for which they want the notification (error, success, or any). For example, a user may wish to assign a particular Notification Template to trigger when `Job Template 1` fails.
* User assigns the notification to any of the various objects that support it (all variants of Job Templates as well as organizations and projects) and at the appropriate trigger level for which they want the notification (error, success, or start). For example, a user may wish to assign a particular Notification Template to trigger when `Job Template 1` fails.
## Templated notification messages
@ -53,19 +54,21 @@ Notification templates assigned at certain levels will inherit notifications def
* Inventory Updates will use notifications defined on the Organization it is in.
* Ad-hoc commands will use notifications defined on the Organization with which that inventory is associated.
## Workflow
When a job starts, succeeds or fails, the running, success or error handler, respectively, will pull a list of relevant notifications using the procedure defined above. It then creates a Notification Object for each one containing relevant details about the job and then **sends** it to the destination (email addresses, Slack channel(s), SMS numbers, etc.). These Notification objects are available as related resources on job types (Jobs, Inventory Updates, Project Updates), and also at `/api/v2/notifications`. You may also see what notifications have been sent by examining its related resources.
Notifications can succeed or fail but that will _not_ cause its associated job to succeed or fail. The status of the notification can be viewed at its detail endpoint: `/api/v2/notifications/<n>`
## Testing Notifications Before Using Them
Once a Notification Template has been created, its configuration can be tested by utilizing the endpoint at `/api/v2/notification_templates/<n>/test`. This will emit a test notification given the configuration defined by the notification. These test notifications will also appear in the notifications list at `/api/v2/notifications`
# Notification Types
The currently defined Notification Types are:
The currently-defined Notification Types are:
* Email
* Slack
@ -97,7 +100,7 @@ The following should be performed for good acceptance:
Set up a local SMTP mail service. Some options are listed below:
* Postfix service on galaxy: https://galaxy.ansible.com/debops/postfix/
* Postfix service on Galaxy: https://galaxy.ansible.com/debops/postfix/
* Mailtrap has a good free plan that should provide all of the features necessary: https://mailtrap.io/
* Another option is to use a Docker container: `docker run --network="tools_default" -p 25:25 -e maildomain=mail.example.com -e smtp_user=user:pwd --name postfix -d catatnight/postfix`
@ -214,17 +217,17 @@ There are a few modern IRC servers to choose from. [InspIRCd](http://www.inspirc
## Webhook
The webhook notification type in Ansible Tower provides a simple interface for sending `POST`s to a predefined web service. Tower will `POST` to this address using `application/json` content type with the data payload containing all relevant details in json format.
The webhook notification type in Ansible Tower provides a simple interface for sending `POST`s to a predefined web service. Tower will `POST` to this address using `application/json` content type with the data payload containing all relevant details in JSON format.
The parameters are fairly straightforward:
* `url`: The full URL that will be `POST`ed to
* `headers`: Headers in json form where the keys and values are strings. For example: `{"Authentication": "988881adc9fc3655077dc2d4d757d480b5ea0e11", "MessageType": "Test"}`
* `headers`: Headers in JSON form where the keys and values are strings. For example: `{"Authentication": "988881adc9fc3655077dc2d4d757d480b5ea0e11", "MessageType": "Test"}`
### Test Considerations
* Test HTTP service and HTTPS, also specifically test HTTPS with a self signed cert.
* Verify that the headers and payload are present, that the payload is json, and the content type is specifically `application/json`
* Verify that the headers and payload are present, that the payload is JSON, and the content type is specifically `application/json`
### Test Service
@ -240,7 +243,7 @@ Note that this won't respond correctly to the notification, so it will yield an
https://gist.github.com/matburt/73bfbf85c2443f39d272
The link below shows how to define an endpoint and parse headers and json content. It doesn't show how to configure Flask for HTTPS, but is fairly straightforward:
The link below shows how to define an endpoint and parse headers and JSON content. It doesn't show how to configure Flask for HTTPS, but is fairly straightforward:
http://flask.pocoo.org/snippets/111/
You can also link an `httpbin` service to the development environment for testing webhooks using:

View File

@ -1,62 +1,77 @@
# awx
awx provides a web interface and distributed task engine for scheduling and
# AWX
AWX provides a web interface and distributed task engine for scheduling and
running Ansible playbooks. As such, it relies heavily on the interfaces
provided by Ansible. This document provides a birds-eye view of the notable
touchpoints between awx and Ansible.
touchpoints between AWX and Ansible.
## Terminology
awx has a variety of concepts which map to components of Ansible, or
AWX has a variety of concepts which map to components of Ansible, or
which further abstract them to provide functionality on top of Ansible. A few
of the most notable ones are:
### Projects
Projects represent a collection of Ansible playbooks. Most awx users create
Projects represent a collection of Ansible playbooks. Most AWX users create
Projects that import periodically from source control systems (such as git,
mercurial, or subversion repositories). This import is accomplished via an
ansible playbook included with awx (which makes use of the various source
Ansible playbook included with AWX (which makes use of the various source
control management modules in Ansible).
### Inventories
awx manages Inventories, Groups, and Hosts, and provides a RESTful interface
AWX manages Inventories, Groups, and Hosts, and provides a RESTful interface
that maps to static and dynamic Ansible inventories. Inventory data can
be entered into awx manually, but many users perform Inventory Syncs to import
be entered into AWX manually, but many users perform Inventory Syncs to import
inventory data from a variety of external sources.
### Job Templates
A Job Template is a definition and set of parameters for running
`ansible-playbook`. If defines metadata about a given playbook run, such as:
* a named identifier
* an associated inventory to run against
* the project and `.yml` playbook to run
* a variety of other options which map directly to ansible-playbook
arguments (extra_vars, verbosity, forks, limit, etc...)
* a variety of other options which map directly to `ansible-playbook`
arguments (`extra_vars`, verbosity, forks, limit, etc...)
### Credentials
awx stores sensitive credential data which can be attached to `ansible-playbook`
AWX stores sensitive credential data which can be attached to `ansible-playbook`
processes that it runs. This data can be oriented towards SSH connection
authentication (usernames, passwords, SSH keys and passphrases),
ansible-specific prompts (such as Vault passwords), or environmental
Ansible-specific prompts (such as Vault passwords), or environmental
authentication values which various Ansible modules depend on (such as setting
`AWS_ACCESS_KEY_ID` in an environment variable, or specifying
`ansible_ssh_user` as an extra variable).
## Canonical Example
Bringing all of this terminology together, a "Getting Started using AWX" might
Bringing all of this terminology together, a "Getting Started Using AWX" might
involve:
* Creating a new Project that imports playbooks from e.g., a remote git repository
* Creating a new Project that imports playbooks from, for example, a remote git repository
* Manually creating or importing an Inventory which defines where the playbook(s) will run
* Optionally, saving a Credential which contains SSH authentication details for
the host(s) where the playbook will run
* Creating a Job Template that specifies which Project and playbook to run and
where to run it (Inventory), and any necessary Credentials for e.g., SSH
authentication
where to run it (Inventory), and any necessary Credentials (*e.g.*, SSH
authentication)
* Launching the Job Template and viewing the results
## awx's Interaction with Ansible
The touchpoints between awx and Ansible are mostly encompassed by
everything that happens *after* a job is started in awx. Specifically, this
## AWX's Interaction with Ansible
The touchpoints between AWX and Ansible are mostly encompassed by
everything that happens *after* a job is started in AWX. Specifically, this
includes:
* Any time a Job Template is launched
@ -64,49 +79,57 @@ includes:
* Any time an Inventory Sync is performed
* Any time an Adhoc Command is run
### Spawning Ansible Processes
awx relies on a handful of stable interfaces in its interaction with Ansible.
AWX relies on a handful of stable interfaces in its interaction with Ansible.
The first of these are the actual CLI for `ansible-playbook` and
`ansible-inventory`.
When a Job Template or Project Update is run in awx, an actual
When a Job Template or Project Update is run in AWX, an actual
`ansible-playbook` command is composed and spawned in a pseudoterminal on one
of the servers/containers that make up the awx installation. This process runs
of the servers/containers that make up the AWX installation. This process runs
until completion (or until a configurable timeout), and the return code,
stdout, and stderr of the process are recorded in the awx database. Adhoc
`stdout`, and `stderr` of the process are recorded in the AWX database. Ad hoc
commands work the same way, though they spawn `ansible` processes instead of
`ansible-playbook`.
Similarly, when an Inventory Sync runs, an actual `ansible-inventory` process
runs, and its output is parsed and persisted into the awx database as Hosts and
runs, and its output is parsed and persisted into the AWX database as Hosts and
Groups.
awx relies on stability in CLI behavior to function properly across Ansible
AWX relies on stability in CLI behavior to function properly across Ansible
releases; this includes the actual CLI arguments _and_ the behavior of task
execution and prompts (such as password, become, and Vault prompts).
execution and prompts (such as password, `become`, and Vault prompts).
### Capturing Event Data
awx applies an Ansible callback plugin to all `ansible-playbook` and `ansible`
AWX applies an Ansible callback plugin to all `ansible-playbook` and `ansible`
processes it spawns. This allows Ansible events to be captured and persisted
into the awx database; this process is what drives the "streaming" web UI
you'll see if you launch a job from the awx web interface and watch its results
appears on the screen. awx relies on stability in this plugin interface, the
into the AWX database; this process is what drives the "streaming" web UI
you'll see if you launch a job from the AWX web interface and watch its results
appears on the screen. AWX relies on stability in this plugin interface, the
heirarchy of emitted events based on strategy, and _especially_ the structure
of event data to work across Ansible releases:
![Event Data Diagram](https://user-images.githubusercontent.com/722880/35641610-ae7f1dea-068e-11e8-84fb-0f96043d53e4.png)
### Fact Caching
awx provides a custom fact caching implementation that allows users to store
facts for playbook runs across subsequent Job Template runs. Specifically, awx
AWX provides a custom fact caching implementation that allows users to store
facts for playbook runs across subsequent Job Template runs. Specifically, AWX
makes use of the `jsonfile` fact cache plugin; after `ansible-playbook` runs
have exited, awx consumes the entire `jsonfile` cache and persists it in the
awx database. On subsequent Job Template runs, prior `jsonfile` caches are
have exited, AWX consumes the entire `jsonfile` cache and persists it in the
AWX database. On subsequent Job Template runs, prior `jsonfile` caches are
restored to the local file system so the new `ansible-playbook` process makes
use of them.
### Environment-Based Configuration
awx injects credentials and module configuration for a number of Ansible
AWX injects credentials and module configuration for a number of Ansible
modules via environment variables. Examples include:
* `ANSIBLE_NET_*` and other well-known environment variables for network device authentication
@ -114,5 +137,5 @@ modules via environment variables. Examples include:
(`AWS_ACCESS_KEY_ID`, `GCE_EMAIL`, etc...)
* SSH-oriented configuration flags, such as `ANSIBLE_SSH_CONTROL_PATH`
awx relies on stability in these configuration options to reliably support
AWX relies on stability in these configuration options to reliably support
credential injection for supported Ansible modules.

View File

@ -1,14 +1,15 @@
## Process Isolation Overview
In older version of Ansible Tower we used a system called `proot` to isolate tower job processes from the rest of the system.
In older versions of Ansible Tower, we used a system called `proot` to isolate Tower job processes from the rest of the system.
For Tower 3.1 and later we have switched to using `bubblewrap` which is a much lighter weight and maintained process isolation system.
Tower version 3.1 and later switched to using `bubblewrap`, which is a much lighter-weight and maintained process isolation system.
Tower 3.5 and later uses the process isolation feature in Ansible runner to achieve process isolation.
Tower 3.5 forward uses the process isolation feature in ansible runner to achieve process isolation.
### Activating Process Isolation
By default `bubblewrap` is enabled, this can be turned off via Tower Config or from a tower settings file:
`bubblewrap` is enabled by default; it can be turned off via Tower Config or from a Tower settings file:
AWX_PROOT_ENABLED = False
@ -17,16 +18,17 @@ Process isolation, when enabled, will be used for the following Job Types:
* Job Templates - Launching jobs from regular job templates
* Ad-hoc Commands - Launching ad-hoc commands against one or more hosts in inventory
### Tunables
Process Isolation will, by default, hide the following directories from the tasks mentioned above:
* /etc/tower - To prevent exposing Tower configuration
* /var/lib/awx - With the exception of the current project being used (for regular job templates)
* /var/log
* /tmp (or whatever the system temp dir is) - With the exception of the processes's own temp files
* `/etc/tower` - To prevent exposing Tower configuration
* `/var/lib/awx` - With the exception of the current project being used (for regular job templates)
* `/var/log`
* `/tmp` (or whatever the system `temp dir` is) - With the exception of the processes's own temp files
If there is other information on the system that is sensitive and should be hidden that can be added via the Tower Configuration Screen
If there is other information on the system that is sensitive and should be hidden, it can be added via the Tower Configuration Screen
or by updating the following entry in a tower settings file:
AWX_PROOT_HIDE_PATHS = ['/list/of/', '/paths']
@ -35,10 +37,11 @@ If there are any directories that should specifically be exposed that can be set
AWX_PROOT_SHOW_PATHS = ['/list/of/', '/paths']
By default the system will use the system's tmp dir (/tmp by default) as it's staging area. This can be changed:
By default, the system will use the system's `tmp dir` (`/tmp` by default) as its staging area. This can be changed via the following setting:
AWX_PROOT_BASE_PATH = "/opt/tmp"
### Project Folder Isolation
Starting in AWX versions above 6.0.0, the project folder will be copied for each job run.

View File

@ -1,19 +1,18 @@
# Prometheus Container
## Development
AWX comes with an example prometheus container and make target. To use it:
AWX comes with an example Prometheus container and `make` target. To use it:
1. Edit `tools/prometheus/prometheus.yml` and update the `basic_auth` section
to specify a valid user/password for an AWX user you've created.
Alternatively, you can provide an OAuth2 token (which can be generated at
`/api/v2/users/N/personal_tokens/`).
> Note: By default, the config assumes a user with username=admin and password=password.
2. Start the Prometheus container:
`make prometheus`
3. The Prometheus UI will now be accessible at `http://localhost:9090/graph`.
There should be no extra setup needed. You can try executing this query in the
There should be no extra setup needed. You can try executing this query in the
UI to get back the number of active sessions: `awx_sessions_total`

View File

@ -1,19 +1,19 @@
## Launch-time Configurations / Prompting
Admins of templates in AWX have the option to allow fields to be over-written
Admins of templates in AWX have the option to allow fields to be overwritten
by user-provided values at the time of launch. The job that runs will
then use the launch-time values in lieu of the template values.
Fields that can be prompted for, and corresponding "ask_" variables
Fields that can be prompted for, and corresponding `"ask_"` variables
(which exist on the template and must be set to `true` to enable prompting)
are the following.
are the following:
##### Standard Pattern with Character Fields
- `ask_<variable>_on_launch` allows use of
- `<variable>`
##### Standard Pattern With Character Fields
The standard pattern applies to fields
- `ask_<variable>_on_launch` allows use of `<variable>`
The standard pattern applies to the following fields:
- `job_type`
- `skip_tags`
@ -22,27 +22,23 @@ The standard pattern applies to fields
- `verbosity`
- `scm_branch`
##### Non-Standard Cases
- `ask_variables_on_launch` allows unrestricted use of
- `extra_vars`
- `ask_tags_on_launch` allows use of
- `job_tags`
- Enabled survey allows restricted use of
- `extra_vars`, only for variables in survey (with qualifiers)
- `ask_credential_on_launch` allows use of
- `credentials`
- `ask_inventory_on_launch` allows use of
- `inventory`
- `ask_variables_on_launch` allows unrestricted use of `extra_vars`
- `ask_tags_on_launch` allows use of `job_tags`
- Enabled survey allows restricted use of `extra_vars`, only for variables in survey (with qualifiers)
- `ask_credential_on_launch` allows use of `credentials`
- `ask_inventory_on_launch` allows use of `inventory`
Surveys are a special-case of prompting for variables - applying a survey to
a template white-lists variable names in the survey spec (requires the survey
a template whitelists variable names in the survey spec (requires the survey
spec to exist and `survey_enabled` to be true). On the other hand,
if `ask_variables_on_launch` is true, users can provide any variables in
extra_vars.
`extra_vars`.
Prompting enablement for all types of credentials is controlled by `ask_credential_on_launch`.
Clients can manually provide a list of credentials of any type, but only 1 of _each_ type, in
Clients can manually provide a list of credentials of any type, but only one of _each_ type, in
`credentials` on a POST to the launch endpoint.
If the job is being spawned by a saved launch configuration (such as a schedule),
credentials are managed by the many-to-many relationship `credentials` relative
@ -51,27 +47,26 @@ The credentials in this relationship will either add to the job template's
credential list, or replace a credential in the job template's list if it
is the same type.
### Manual use of Prompts
### Manual Use of Prompts
Fields enabled as prompts in the template can be used for the following
actions in the API.
actions in the API:
- POST to `/api/v2/job_templates/N/launch/`
- can accept all prompt-able fields
- POST to `/api/v2/workflow_job_templates/N/launch/`
- can accept certain fields, see `workflow.md`
- POST to `/api/v2/system_job_templates/N/launch/`
- can accept certain fields, with no user configuration
- POST to `/api/v2/job_templates/N/launch/` (can accept all prompt-able fields)
- POST to `/api/v2/workflow_job_templates/N/launch/` (can accept certain fields, see `workflow.md`)
- POST to `/api/v2/system_job_templates/N/launch/` (can accept certain fields, with no user configuration)
When launching manually, certain restrictions apply to the use of credentials
- if providing deprecated `extra_credentials` this becomes the "legacy" method,
When launching manually, certain restrictions apply to the use of credentials:
- If providing deprecated `extra_credentials`, this becomes the "legacy" method
and imposes additional restrictions on relaunch,
and is mutually exclusive with the use of `credentials` field
- if providing `credentials`, existing credentials on the job template may
- If providing `credentials`, existing credentials on the job template may
only be removed if replaced by another credential of the same type
this is so that relaunch will use the up-to-date credential on the template
if it has been edited since the prior launch
#### Data Rules for Prompts
For the POST action to launch, data for "prompts" are provided as top-level
@ -80,8 +75,8 @@ provided for `credentials`, which is otherwise not possible in AWX API design.
The list of credentials provided in the POST data will become the list
for the spawned job.
Values of `null` are not allowed, if the field is not being over-ridden,
the key should not be given in the payload. A 400 should be returned if
Values of `null` are not allowed; if the field is not being over-ridden,
the key should not be given in the payload. A `400` should be returned if
this is done.
Example:
@ -97,7 +92,7 @@ POST to `/api/v2/job_templates/N/launch/` with data:
}
```
where the job template has credentials `[2, 3, 5]`, and the credential type
...where the job template has credentials `[2, 3, 5]`, and the credential type
are the following:
- 1 - gce
@ -106,7 +101,7 @@ are the following:
- 4 - aws
- 5 - openstack
Assuming that the job template is configured to prompt for all these,
Assuming that the job template is configured to prompt for all of these
fields, here is what happens in this action:
- `job_type` of the job takes the value of "check"
@ -117,27 +112,28 @@ fields, here is what happens in this action:
- `extra_vars` of the job template will be used without any overrides
If `extra_vars` in the request data contains some keys, these will
be combined with the job template extra_vars dictionary, with the
be combined with the job template `extra_vars` dictionary, with the
request data taking precedence.
Provided credentials will replace any job template credentials of the same
exclusive type. In the example, the job template
credential 3 was replaced with the provided credential 1, because a job
may only use 1 gce credential because these two credentials define the
Credential 3 was replaced with the provided Credential 1, because a job
may only use one GCE credential because these two credentials define the
same environment variables and configuration file.
If the job had not provided the credential 1, a 400 error would have been
If the job had not provided the Credential 1, a 400 error would have been
returned because the job must contain the same types of credentials as its
job template.
### Saved Launch-time Configurations
Several other mechanisms which automatically launch jobs can apply prompts
at launch-time that are saved in advance.
at launch-time that are saved in advance:
- Workflow nodes
- Schedules
- Job relaunch / re-scheduling
- (partially) workflow job templates
- (partially) Workflow job templates
In the case of workflow nodes and schedules, the prompted fields are saved
directly on the model. Those models include Workflow Job Template Nodes,
@ -153,6 +149,7 @@ and only used to prepare the correct launch-time configuration for subsequent
re-launch and re-scheduling of the job. To see these prompts for a particular
job, do a GET to `/api/v2/jobs/N/create_schedule/`.
#### Workflow Node Launch Configuration
Workflow job nodes will combine `extra_vars` from their parent
@ -170,35 +167,38 @@ If the node's job template has `ask_inventory_on_launch` set to false and
the node provides an inventory, this resource will not be used in the spawned
job. If a user creates a node that would do this, a 400 response will be returned.
#### Workflow Job Template Prompts
Workflow JTs are different than other cases, because they do not have a
Workflow job templates are different from other cases because they do not have a
template directly linked, so their prompts are a form of action-at-a-distance.
When the node's prompts are gathered to spawn its job, any prompts from the workflow job
will take precedence over the node's value.
As a special exception, `extra_vars` from a workflow will not obey JT survey
and prompting rules, both both historical and ease-of-understanding reasons.
As a special exception, `extra_vars` from a workflow will not obey the job template survey
and prompting rules, both for historical and ease-of-understanding reasons.
This behavior may change in the future.
Other than that exception, JT prompting rules are still adhered to when
Other than that exception, job template prompting rules are still adhered to when
a job is spawned.
#### Job Relaunch and Re-scheduling
Job relaunch does not allow user to provide any prompted fields at the time of relaunch.
Job relaunch does not allow a user to provide any prompted fields at the time of relaunch.
Relaunching will re-apply all the prompts used at the
time of the original launch. This means that:
- all prompts restrictions apply as-if the job was being launched with the
- All prompts restrictions apply as if the job was being launched with the
current job template (even if it has been modified)
- RBAC rules for prompted resources still apply
Those same rules apply when created a schedule from the
Those same rules apply when creating a schedule from the
`/api/v2/schedule_job/` endpoint.
Jobs orphaned by a deleted job template can be relaunched,
but only with organization or system administrator privileges.
but only with Organization or System Administrator privileges.
#### Credential Password Prompting Restriction
@ -208,6 +208,7 @@ of a saved launch-time configuration. This is for security reasons.
Credential passwords _can_ be provided at time of relaunch.
### Validation
The general rule for validation:
@ -219,6 +220,7 @@ In other words, if no prompts (including surveys) are configured, a job
must be identical to the template it was created from, for all fields
that become `ansible-playbook` options.
#### Disallowed Fields
If a manual launch provides fields not allowed by the rules of the template,
@ -227,6 +229,7 @@ the behavior is:
- Launches without those fields, ignores fields
- lists fields in `ignored_fields` in POST response
#### Data Type Validation
All fields provided on launch, or saved in a launch-time configuration
@ -237,11 +240,13 @@ if saving to the job template model. For example, only certain values of
Surveys impose additional restrictions, and violations of the survey
validation rules will prevent launch from proceeding.
#### Fields Required on Launch
Failing to provide required variables also results in a validation error
when manually launching. It will also result in a 400 error if the user
fails to provide those fields when saving a WFJT node or schedule.
fails to provide those fields when saving a workflow job template node or schedule.
#### Broken Saved Configurations
@ -254,26 +259,31 @@ launched (typical example is a null `inventory`), then the job should be
created in an "error" state with `job_explanation` containing a summary
of what happened.
### Scenarios to have Coverage for
- variable precedence
- schedule has survey answers for WFJT survey
- WFJT has node that has answers to JT survey
- on launch, the schedule answers override all others
- survey password durability
- schedule has survey password answers from WFJT survey
- WFJT node has answers to different password questions from JT survey
- Saving with "$encrypted$" value will either
### Scenarios to Cover
**Variable Precedence**
- Schedule has survey answers for workflow job template survey
- Workflow job template has node that has answers to job template survey
- On launch, the schedule answers override all others
**Survey Password Durability**
- Schedule has survey password answers from workflow job template survey
- Workflow job template node has answers to different password questions from job template survey
- Saving with `"$encrypted$"` value will either:
- become a no-op, removing the key if a valid question default exists
- replace with the database value if question was previously answered
- final job it spawns has both answers encrypted
- POST to associate credential to WFJT node
- requires admin to WFJT and execute to JT
- this is in addition to the restriction of `ask_credential_on_launch`
- credentials merge behavior
- JT has machine & cloud credentials, set to prompt for credential on launch
- schedule for JT provides no credentials
- spawned job still uses all JT credentials
- credentials deprecated behavior
- manual launch providing `"extra_credentials": []` should launch with no job credentials
- such jobs cannot have schedules created from them
- Final job it spawns has both answers encrypted
**POST to Associate Credential to Workflow Job Template Node**
- Requires admin to WFJT and execute to job template
- This is in addition to the restriction of `ask_credential_on_launch`
**Credentials Merge Behavior**
- Job template has machine & cloud credentials, set to prompt for credential on launch
- Schedule for job template provides no credentials
- Spawned job still uses all job template credentials
**Credentials Deprecated Behavior**
- Manual launch providing `"extra_credentials": []` should launch with no job credentials
- Such jobs cannot have schedules created from them

View File

@ -7,13 +7,13 @@ The intended audience of this document is the Ansible Tower developer.
### RBAC - System Basics
There are three main concepts to be familiar with, Roles, Resources, and Users.
There are three main concepts to be familiar with: Roles, Resources, and Users.
Users can be members of a role, which gives them certain access to any
resources associated with that role, or any resources associated with "descendent"
roles.
For example, if I have an organization named "MyCompany" and I want to allow
two people, "Alice", and "Bob", access to manage all the settings associated
two people, "Alice", and "Bob", access to manage all of the settings associated
with that organization, I'd make them both members of the organization's `admin_role`.
It is often the case that you have many Roles in a system, and you want some
@ -21,9 +21,9 @@ roles to include all of the capabilities of other roles. For example, you may
want a System Administrator to have access to everything that an Organization
Administrator has access to, who has everything that a Project Administrator
has access to, and so on. We refer to this concept as the 'Role Hierarchy', and
is represented by allowing Roles to have "Parent Roles". Any permission that a
Role has is implicitly granted to any parent roles (or parents of those
parents, and so on). Of course Roles can have more than one parent, and
is represented by allowing roles to have "Parent Roles". Any permission that a
role has is implicitly granted to any parent roles (or parents of those
parents, and so on). Of course roles can have more than one parent, and
capabilities are implicitly granted to all parents. (Technically speaking, this
forms a directional acyclic graph instead of a strict hierarchy, but the
concept should remain intuitive.)
@ -34,10 +34,10 @@ concept should remain intuitive.)
### Implementation Overview
The RBAC system allows you to create and layer roles for controlling access to resources. Any Django Model can
be made into a resource in the RBAC system by using the `ResourceMixin`. Once a model is accessible as a resource you can
be made into a resource in the RBAC system by using the `ResourceMixin`. Once a model is accessible as a resource, you can
extend the model definition to have specific roles using the `ImplicitRoleField`. Within the declaration of
this role field you can also specify any parents the role may have, and the RBAC system will take care of
all the appropriate ancestral binding that takes place behind the scenes to ensure that the model you've declared
all of the appropriate ancestral binding that takes place behind the scenes to ensure that the model you've declared
is kept up to date as the relations in your model change.
### Roles
@ -52,7 +52,7 @@ what roles are checked when accessing a resource.
| -- AdminRole
|-- parent = ResourceA.AdminRole
When a user attempts to access ResourceB we will check for their access using the set of all unique roles, including the parents.
When a user attempts to access ResourceB, we will check for their access using the set of all unique roles, including the parents.
ResourceA.AdminRole, ResourceB.AdminRole
@ -60,7 +60,7 @@ This would provide any members of the above roles with access to ResourceB.
#### Singleton Role
There is a special case _Singleton Role_ that you can create. This type of role is for system wide roles.
There is a special case _Singleton Role_ that you can create. This type of role is for system-wide roles.
### Models
@ -72,7 +72,7 @@ The RBAC system defines a few new models. These models represent the underlying
##### `visible_roles(cls, user)`
`visible_roles` is a class method that will lookup all of the `Role` instances a user can "see". This includes any roles the user is a direct decendent of as well as any ancestor roles.
`visible_roles` is a class method that will look up all of the `Role` instances a user can "see". This includes any roles the user is a direct descendent of as well as any ancestor roles.
##### `singleton(cls, name)`
@ -137,7 +137,7 @@ By mixing in the `ResourceMixin` to your model, you are turning your model in to
## Usage
After exploring the _Overview_ the usage of the RBAC implementation in your code should feel unobtrusive and natural.
After exploring the _Overview_, the usage of the RBAC implementation in your code should feel unobtrusive and natural.
```python
# make your model a Resource
@ -150,7 +150,7 @@ After exploring the _Overview_ the usage of the RBAC implementation in your code
)
```
Now that your model is a resource and has a `Role` defined, you can begin to access the helper methods provided to you by the `ResourceMixin` for checking a users access to your resource. Here is the output of a Python REPL session.
Now that your model is a resource and has a `Role` defined, you can begin to access the helper methods provided to you by the `ResourceMixin` for checking a user's access to your resource. Here is the output of a Python REPL session:
```python
# we've created some documents and a user

View File

@ -1,27 +1,29 @@
Starting from Tower 3.3 and API v2, user are able to copy some existing resource objects to quickly
create new resource objects via POSTing to corresponding `/copy/` endpoint. A new `CopyAPIView` class
Starting from Tower 3.3 and API V2, users are able to copy some existing resource objects to quickly
create new resource objects via POSTing to the corresponding `/copy/` endpoint. A new `CopyAPIView` class
is introduced as the base view class for `/copy/` endpoints. It mimics the process of manually fetching
fields from the existing object to create a new object, plus the ability to automatically detect sub
structures of existing objects and make a background task-based deep copy when necessary.
## Usage
If an AWX resource is copiable, all of its object detail API views will have a related URL field
`"copy"`, which has form `/api/<version>/<resource name>/<object pk>/copy/`. GET to this endpoint
If an AWX resource is able to be copied, all of its object detail API views will have a related URL field
`"copy"`, which has the form `/api/v2/<resource name>/<object pk>/copy/`. A GET to this endpoint
will return `can_copy`, which is a boolean indicating whether the current user can execute a copy
operation; POST to this endpoint actually copies the resource object. One field `name` is required
which will later be used as the name of the created copy. Upon success, 201 will be returned, along
operation; POSTing to this endpoint actually copies the resource object. One field, `name`, is required;
this will later be used as the name of the created copy. Upon success, a 201 will be returned, along
with the created copy.
For some resources like credential, the copy process is not time-consuming, thus the entire copy
process will take place in the request-response cycle, and the created object copy is returned as
For some resources like credentials, the copy process is not time-consuming, thus the entire copy
process will take place in the request-response cycle, and the created object copy is returned as a
POST response.
For some other resources like inventory, the copy process can take longer, depending on the number
of sub-objects to copy (will explain later). Thus, although the created copy will be returned, the
For some other resources like inventories, the copy process can take longer, depending on the number
of sub-objects to copy (this will be explained later in this document). Thus, although the created copy will be returned, the
copy process is not finished yet. All sub-objects (like all hosts and groups of an inventory) will
not be created until after the background copy task is finished in success.
not be created until after the background copy task is finished successfully.
Currently the available list of copiable resources are:
Currently, the available list of copiable resources are:
- job templates
- projects
@ -31,20 +33,22 @@ Currently the available list of copiable resources are:
- notifications
- inventory scripts
For most of the resources above, only the object to be copied itself will be copied; For some resources
For most of the resources above, only the object to be copied itself will be copied; for some resources
like inventories, however, sub resources belonging to the resource will also be copied to maintain the
full functionality of the copied new resource. In specific:
full functionality of the copied new resource. Specifically:
- When an inventory is copied, all its hosts, groups and inventory sources are copied.
- When a workflow job template is copied, all its workflow job template nodes are copied.
- When an inventory is copied, all of its hosts, groups and inventory sources are copied.
- When a workflow job template is copied, all of its workflow job template nodes are copied.
## How to Add a Copy Endpoint for a Resource
## How to add a copy end-point for a resource
The copy behavior of different resources largely follow the same pattern, therefore a unified way of
enabling copy capability for resources is available for developers:
enabling copy capability for resources is available for developers.
Firstly, create a `/copy/` url endpoint for the target resource.
First, create a `/copy/` URL endpoint for the target resource.
Secondly, create a view class as handler to `/copy/` endpoint. This view class should be subclassed
Second, create a view class as handler to the `/copy/` endpoint. This view class should be subclassed
from `awx.api.generics.CopyAPIView`. Here is an example:
```python
class JobTemplateCopy(CopyAPIView):
@ -52,12 +56,13 @@ class JobTemplateCopy(CopyAPIView):
model = JobTemplate
copy_return_serializer_class = JobTemplateSerializer
```
Note the above example declares a custom class attribute `copy_return_serializer_class`. This attribute
is used by `CopyAPIView` to render the created copy in POST response, so in most cases the value should
be the same as `serializer_class` of corresponding resource detail view, like here the value is the
be the same as `serializer_class` of corresponding resource detail view; for example, here the value is the
`serializer_class` of `JobTemplateDetail`.
Thirdly, for the underlying model of the resource, Add 2 macros, `FIELDS_TO_PRESERVE_AT_COPY` and
Third, for the underlying model of the resource, add two macros, `FIELDS_TO_PRESERVE_AT_COPY` and
`FIELDS_TO_DISCARD_AT_COPY`, as needed. Here is an example:
```python
class JobTemplate(UnifiedJobTemplate, JobOptions, SurveyJobTemplateMixin, ResourceMixin):
@ -91,10 +96,10 @@ Lastly, unit test copy behavior of the new endpoint in `/awx/main/tests/function
update docs (like this doc).
Fields in `FIELDS_TO_PRESERVE_AT_COPY` must be solid model fields, while fields in
`FIELDS_TO_DISCARD_AT_COPY` do not need to be. Note there are hidden fields not visible from model
`FIELDS_TO_DISCARD_AT_COPY` do not need to be. Note that there are hidden fields not visible from the model
definition, namely reverse relationships and fields inherited from super classes or mix-ins. A help
script `tools/scripts/list_fields.py` is available to inspect a model and list details of all its
available fields.
available fields:
```
# In shell_plus
>>> from list_fields import pretty_print_model_fields
@ -103,10 +108,10 @@ available fields.
`CopyAPIView` will automatically detect sub objects of an object, and do a deep copy of all sub objects
as a background task. There are sometimes permission issues with sub object copy. For example,
when copying nodes of a workflow job template, there are cases where the user performing copy has no use
permission of related credential and inventory of some nodes, and it is desired those fields will be
`None`. In order to do that, developer should provide a static method `deep_copy_permission_check_func`
under corresponding specific copy view. Like
when copying nodes of a workflow job template, there are cases where the user performing copy has no use for
permission of related credential and inventory of some nodes, and those fields should be
`None`. In order to do that, the developer should provide a static method `deep_copy_permission_check_func`
under corresponding specific copy view:
```python
class WorkflowJobTemplateCopy(WorkflowsEnforcementMixin, CopyAPIView):
@ -122,45 +127,46 @@ class WorkflowJobTemplateCopy(WorkflowsEnforcementMixin, CopyAPIView):
# Other code
```
Static method `deep_copy_permission_check_func` must have and only have two arguments: `user`, the
user performing the copy; `new_objs`, a list of all sub objects of the created copy. Sub objects in
`new_objs` are initially populated disregarding any permission constraints, developer shall check
`user`'s permission against these new sub objects and react like unlink related objects or sending
warning logs. `deep_copy_permission_check_func` should not return anything.
user performing the copy, and `new_objs`, a list of all sub objects of the created copy. Sub objects in
`new_objs` are initially populated disregarding any permission constraints; the developer shall check
`user`'s permission against these new sub objects and unlink related objects or send
warning logs as necessary. `deep_copy_permission_check_func` should not return anything.
Lastly, macro `REENCRYPTION_BLACKLIST_AT_COPY` is available as part of a model definition. It is a
list of field names which will escape re-encryption during copy. For example, `extra_data` field
list of field names which will escape re-encryption during copy. For example, the `extra_data` field
of workflow job template nodes.
## Acceptance Criteria
* Credentials should be able to copy themselves. The behavior of copying credential A shall be exactly
the same as creating a credential B with all needed fields for creation coming from credential A.
the same as creating a credential B with all necessary fields for creation coming from credential A.
* Inventories should be able to copy themselves. The behavior of copying inventory A shall be exactly
the same as creating an inventory B with all needed fields for creation coming from inventory A. Other
the same as creating an inventory B with all necessary fields for creation coming from inventory A. Other
than that, inventory B should inherit A's `instance_groups`, and have exactly the same host and group
structures as A.
* Inventory scripts should be able to copy themselves. The behavior of copying inventory script A
shall be exactly the same as creating an inventory script B with all needed fields for creation
shall be exactly the same as creating an inventory script B with all necessary fields for creation
coming from inventory script A.
* Job templates should be able to copy themselves. The behavior of copying job template A
shall be exactly the same as creating a job template B with all needed fields for creation
shall be exactly the same as creating a job template B with all necessary fields for creation
coming from job template A. Other than that, job template B should inherit A's `labels`,
`instance_groups`, `credentials` and `survey_spec`.
* Notification templates should be able to copy themselves. The behavior of copying notification
template A shall be exactly the same as creating a notification template B with all needed fields
template A shall be exactly the same as creating a notification template B with all necessary fields
for creation coming from notification template A.
* Projects should be able to copy themselves. The behavior of copying project A shall be the
same as creating a project B with all needed fields for creation coming from project A, except for
same as creating a project B with all necessary fields for creation coming from project A, except for
`local_path`, which will be populated by triggered project update. Other than that, project B
should inherit A's `labels`, `instance_groups` and `credentials`.
* Workflow Job templates should be able to copy themselves. The behavior of copying workflow job
template A shall be exactly the same as creating a workflow job template B with all needed fields
template A shall be exactly the same as creating a workflow job template B with all necessary fields
for creation coming from workflow job template A. Other than that, workflow job template B should
inherit A's `labels`, `instance_groups`, `credentials` and `survey_spec`, and have exactly the
same workflow job template node structure as A.
* In all copy processes, `name` field of the created copy of the original object should be able to
customize in the POST body.
* In all copy processes, the `name` field of the created copy of the original object should be customizable in the POST body.
* The permission for a user to make a copy for an existing resource object should be the same as the
permission for a user to create a brand new resource object using fields from the existing object.
* The RBAC behavior of original workflow job template `/copy/` should be pertained. That is, if the
user has no necessary permission to the related project and credential of a workflow job template
user has no permission to access the related project and credential of a workflow job template
node, the copied workflow job template node should have those fields empty.

View File

@ -1,7 +1,7 @@
# Relaunch on Hosts with Status
This feature allows the user to relaunch a job, targeting only hosts marked
as failed in the original job.
This feature allows the user to relaunch a job, targeting only the hosts marked
as "failed" in the original job.
### Definition of "failed"
@ -10,27 +10,27 @@ is different from "hosts with failed tasks". Unreachable hosts can have
no failed tasks. This means that the count of "failed hosts" can be different
from the failed count, given in the summary at the end of a playbook.
This definition corresponds to Ansible .retry files.
This definition corresponds to Ansible `.retry` files.
### API Design of Relaunch
#### Basic Relaunch
POST to `/api/v2/jobs/N/relaunch/` without any request data should relaunch
POSTs to `/api/v2/jobs/N/relaunch/` without any request data should relaunch
the job with the same `limit` value that the original job used, which
may be an empty string.
This is implicitly the "all" option below.
This is implicitly the "all" option, mentioned below.
#### Relaunch by Status
Providing request data containing `{"hosts": "failed"}` should change
the `limit` of the relaunched job to target failed hosts from the previous
job. Hosts will be provided as a comma-separated list in the limit. Formally,
these are options
these are options:
- all: relaunch without changing the job limit
- failed: relaunch against all hos
- failed: relaunch against all hosts
### Relaunch Endpoint
@ -60,12 +60,12 @@ then the request will be rejected. For example, if a GET yielded:
}
```
Then a POST of `{"hosts": "failed"}` should return a descriptive response
...then a POST of `{"hosts": "failed"}` should return a descriptive response
with a 400-level status code.
# Acceptance Criteria
Scenario: user launches a job against host "foobar", and the run fails
Scenario: User launches a job against host "foobar", and the run fails
against this host. User changes name of host to "foo", and relaunches job
against failed hosts. The `limit` of the relaunched job should reference
"foo" and not "foobar".
@ -79,9 +79,9 @@ relaunch the same way that relaunching has previously worked.
If a playbook provisions a host, this feature should behave reasonably
when relaunching against a status that includes these hosts.
Feature should work even if hosts have tricky characters in their names,
This feature should work even if hosts have tricky characters in their names,
like commas.
Also need to consider case where a task `meta: clear_host_errors` is present
inside a playbook, and that the retry subset behavior is the same as Ansible
One may also need to consider cases where a task `meta: clear_host_errors` is present
inside a playbook; the retry subset behavior is the same as Ansible's
for this case.

View File

@ -1,7 +1,6 @@
Scheduled Jobs
==============
## Scheduled Jobs
awx allows jobs to run on a schedule (with optional recurrence rules) via
AWX allows jobs to run on a schedule (with optional recurrence rules) via
an `HTTP POST` to a variety of API endpoints:
HTTP POST
@ -23,9 +22,10 @@ an `HTTP POST` to a variety of API endpoints:
specific example above would run a job every day - for seven consecutive days - starting
on January 15th, 2030 at noon (UTC).
Specifying Timezones
====================
`DTSTART` values provided to awx _must_ provide timezone information (they may
## Specifying Timezones
`DTSTART` values provided to AWX _must_ provide timezone information (they may
not be naive dates).
For UTC dates, `DTSTART` values should be denoted with the `Z` suffix:
@ -48,9 +48,9 @@ A list of _valid_ zone identifiers (which can vary by system) can be found at:
]
UNTIL and Timezones
===================
`DTSTART` values provided to awx _must_ provide timezone information (they may
## UNTIL and Timezones
`DTSTART` values provided to AWX _must_ provide timezone information (they may
not be naive dates).
Additionally, RFC5545 specifies that:
@ -73,9 +73,9 @@ Not Valid:
`DTSTART;TZID=America/New_York:20180601T120000 RRULE:FREQ=DAILY;INTERVAL=1;UNTIL=20180606T170000`
Previewing Schedules
====================
awx provides an endpoint for previewing the future dates and times for
## Previewing Schedules
AWX provides an endpoint for previewing the future dates and times for
a specified `RRULE`. A list of the next _ten_ occurrences will be returned in
local and UTC time:
@ -107,10 +107,9 @@ local and UTC time:
}
RRULE Limitations
=================
## RRULE Limitations
The following aspects of `RFC5545` are _not_ supported by awx schedules:
The following aspects of `RFC5545` are _not_ supported by AWX schedules:
* Strings with more than a single `DTSTART:` component
* Strings with more than a single `RRULE` component
@ -123,8 +122,7 @@ The following aspects of `RFC5545` are _not_ supported by awx schedules:
* The use of `COUNT=` in an `RRULE` with a value over 999
Implementation Details
======================
## Implementation Details
Any time an `awx.model.Schedule` is saved with a valid `rrule` value, the
`dateutil` library is used to burst out a list of all occurrences. From here,
@ -135,7 +133,7 @@ the following dates are saved in the database:
* `main_schedule.dtend` - the _last_ datetime in the list of all occurrences (coerced to UTC)
* `main_schedule.next_run` - the _next_ datetime in list after `utcnow()` (coerced to UTC)
awx makes use of [Celery Periodic Tasks
AWX makes use of [Celery Periodic Tasks
(celerybeat)](http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html)
to run a periodic task that discovers new jobs that need to run at a regular
interval (by default, every 30 seconds). When this task starts, it queries the

View File

@ -1,20 +1,20 @@
Background Tasks in AWX
=======================
In this document, we will go into a bit of detail about how and when AWX runs Python code _in the background_ (_i.e._, _outside_ of the context of an HTTP request), such as:
In this document, we will go into a bit of detail about how and when AWX runs Python code _in the background_ (_i.e._, **outside** of the context of an HTTP request), such as:
* Any time a Job is launched in AWX (a Job Template, an Ad Hoc Command, a Project
Update, an Inventory Update, a System Job), a background process retrieves
metadata _about_ that job from the database and forks some process (_e.g._,
`ansible-playbook`, `awx-manage inventory_import`)
* Certain expensive or time-consuming tasks run in the background
* Certain expensive or time-consuming tasks running in the background
asynchronously (_e.g._, when deleting an inventory).
* AWX runs a variety of periodic background tasks on a schedule. Some examples
are:
- AWX's "Task Manager/Scheduler" wakes up periodically and looks for
`pending` jobs that have been launched and are ready to start running.
`pending` jobs that have been launched and are ready to start running
- AWX periodically runs code that looks for scheduled jobs and launches
them.
them
- AWX runs a variety of periodic tasks that clean up temporary files, and
performs various administrative checks
- Every node in an AWX cluster runs a periodic task that serves as

View File

@ -1,11 +1,11 @@
Tower configuration gives tower users the ability to adjust multiple runtime parameters of Tower, thus take fine-grained control over Tower run.
Tower configuration gives Tower users the ability to adjust multiple runtime parameters of Tower, which enables much more fine-grained control over Tower runs.
## Usage manual
#### To use
The REST endpoint for CRUD operations against Tower configurations is `/api/<version #>/settings/`. GETing to that endpoint will return a list of available Tower configuration categories and their urls, such as `"system": "/api/<version #>/settings/system/"`. The URL given to each category is the endpoint for CRUD operations against individual settings under that category.
#### To Use:
The REST endpoint for CRUD operations against Tower configurations can be found at `/api/v2/settings/`. GETing to that endpoint will return a list of available Tower configuration categories and their URLs, such as `"system": "/api/v2/settings/system/"`. The URL given to each category is the endpoint for CRUD operations against individual settings under that category.
Here is a typical Tower configuration category GET response.
Here is a typical Tower configuration category GET response:
```
GET /api/v2/settings/github-team/
HTTP 200 OK
@ -27,10 +27,10 @@ X-API-Time: 0.026s
}
```
The returned body is a JSON of key-value pairs, where the key is the name of Tower configuration setting, and the value is the value of that setting. To update the settings, simply update setting values and PUT/PATCH to the same endpoint.
The returned body is a JSON of key-value pairs, where the key is the name of the Tower configuration setting, and the value is the value of that setting. To update the settings, simply update setting values and PUT/PATCH to the same endpoint.
#### To develop
Each Django app in tower should have a `conf.py` file where related settings get registered. Below is the general format for `conf.py`:
#### To Develop:
Each Django app in Tower should have a `conf.py` file where related settings get registered. Below is the general format for `conf.py`:
```python
# Other dependencies
@ -52,7 +52,7 @@ register(
# Other setting registries
```
`register` is the endpoint API for registering individual tower configurations:
`register` is the endpoint API for registering individual Tower configurations:
```
register(
setting,
@ -66,34 +66,34 @@ register(
defined_in_file=False,
)
```
Here is the details of each argument:
Here are the details for each argument:
| Argument Name | Argument Value Type | Description |
|--------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `setting` | `str` | Name of the setting. Usually all-capital connected by underscores like `'FOO_BAR'` |
| `field_class` | a subclass of DRF serializer field available in `awx.conf.fields` | The class wrapping around value of the configuration, responsible for retrieving, setting, validating and storing configuration values. |
| `**field_related_kwargs` | **kwargs | Key-worded arguments needed to initialize an instance of `field_class`. |
| `**field_related_kwargs` | `**kwargs` | Key-worded arguments needed to initialize an instance of `field_class`. |
| `category_slug` | `str` | The actual identifier used for finding individual setting categories. |
| `category` | transformable string, like `_('foobar')` | The human-readable form of `category_slug`, mainly for display. |
| `depends_on` | `list` of `str`s | A list of setting names this setting depends on. A setting this setting depends on is another tower configuration setting whose changes may affect the value of this setting. |
| `placeholder` | transformable string, like `_('foobar')` | A human-readable string displaying a typical value for the setting, mainly used by UI |
| `encrypted` | `boolean` | Flag determining whether the setting value should be encrypted |
| `defined_in_file` | `boolean` | Flag determining whether a value has been manually set in settings file. |
| `depends_on` | `list` of `str`s | A list of setting names this setting depends on. A setting this setting depends on is another Tower configuration setting whose changes may affect the value of this setting. |
| `placeholder` | transformable string, like `_('foobar')` | A human-readable string displaying a typical value for the setting, mainly used by the UI. |
| `encrypted` | `boolean` | A flag which determines whether the setting value should be encrypted. |
| `defined_in_file` | `boolean` | A flag which determines whether a value has been manually set in the settings file. |
During Tower bootstrapping, All settings registered in `conf.py` modules of Tower Django apps will be loaded (registered). The set of Tower configuration settings will form a new top-level of `django.conf.settings` object. Later all Tower configuration settings will be available as attributes of it, just like normal Django settings. Note Tower configuration settings take higher priority over normal settings, meaning if a setting `FOOBAR` is both defined in a settings file and registered in a `conf.py`, the registered attribute will be used over the defined attribute every time.
During Tower bootstrapping, **all** settings registered in `conf.py` modules of Tower Django apps will be loaded (registered). This set of Tower configuration settings will form a new top-level of the `django.conf.settings` object. Later, all Tower configuration settings will be available as attributes of it, just like the normal Django settings. Note that Tower configuration settings take higher priority over normal settings, meaning if a setting `FOOBAR` is both defined in a settings file *and* registered in `conf.py`, the registered attribute will be used over the defined attribute every time.
Note when registering new configurations, it is desired to provide a default value if it is possible to do so, as Tower configuration UI has a 'revert all' functionality that revert all settings to it's default value.
Please note that when registering new configurations, it is recommended to provide a default value if it is possible to do so, as the Tower configuration UI has a 'revert all' functionality that reverts all settings to its default value.
Starting from 3.2, Tower configuration supports category-specific validation functions. They should also be defined under `conf.py` in the form
Starting with version 3.2, Tower configuration supports category-specific validation functions. They should also be defined under `conf.py` in the form
```python
def custom_validate(serializer, attrs):
'''
Method details
'''
```
Where argument `serializer` refers to the underlying `SettingSingletonSerializer` object, and `attrs` refers to a dictionary of input items.
...where the argument `serializer` refers to the underlying `SettingSingletonSerializer` object, and `attrs` refers to a dictionary of input items.
Then at the end of `conf.py`, register defined custom validation methods to different configuration categories (`category_slug`) using `awx.conf.register_validate`:
At the end of `conf.py`, register defined custom validation methods to different configuration categories (`category_slug`) using `awx.conf.register_validate`:
```python
# conf.py
...

View File

@ -4,19 +4,18 @@ Our channels/websocket implementation handles the communication between Tower AP
## Architecture
Tower enlists the help of the `django-channels` library to create our communications layer. `django-channels` provides us with per-client messaging integration in to our application by implementing the Asynchronous Server Gateway Interface or ASGI.
Tower enlists the help of the `django-channels` library to create our communications layer. `django-channels` provides us with per-client messaging integration in our application by implementing the Asynchronous Server Gateway Interface (ASGI).
To communicate between our different services we use RabbitMQ to exchange messages. Traditionally, `django-channels` uses Redis, but Tower uses a custom `asgi_amqp` library that allows use to RabbitMQ for the same purpose.
To communicate between our different services we use RabbitMQ to exchange messages. Traditionally, `django-channels` uses Redis, but Tower uses a custom `asgi_amqp` library that allows access to RabbitMQ for the same purpose.
Inside Tower we use the emit_channel_notification which places messages on to the queue. The messages are given an explicit
event group and event type which we later use in our wire protocol to control message delivery to the client.
Inside Tower we use the `emit_channel_notification` function which places messages onto the queue. The messages are given an explicit event group and event type which we later use in our wire protocol to control message delivery to the client.
## Protocol
You can connect to the Tower channels implementation using any standard websocket library but pointing it to `/websocket`. You must
You can connect to the Tower channels implementation using any standard websocket library by pointing it to `/websocket`. You must
provide a valid Auth Token in the request URL.
Once you've connected, you are not subscribed to any event groups. You subscribe by sending a json request that looks like the following:
Once you've connected, you are not subscribed to any event groups. You subscribe by sending a `json` request that looks like the following:
'groups': {
'jobs': ['status_changed', 'summary'],
@ -30,37 +29,28 @@ Once you've connected, you are not subscribed to any event groups. You subscribe
'control': ['limit_reached_<user_id>'],
}
These map to the event group and event type you are interested in. Sending in a new groups dictionary will clear all of your previously
subscribed groups before subscribing to the newly requested ones. This is intentional, and makes the single page navigation much easier since
you only need to care about current subscriptions.
These map to the event group and event type that the user is interested in. Sending in a new groups dictionary will clear all previously-subscribed groups before subscribing to the newly requested ones. This is intentional, and makes the single page navigation much easier since users only need to care about current subscriptions.
## Deployment
This section will specifically discuss deployment in the context of websockets and the path your request takes through the system.
This section will specifically discuss deployment in the context of websockets and the path those requests take through the system.
Note: The deployment of Tower changes slightly with the introduction of `django-channels` and websockets. There are some minor differences between
production and development deployments that I will point out, but the actual services that run the code and handle the requests are identical
between the two environments.
**Note:** The deployment of Tower changes slightly with the introduction of `django-channels` and websockets. There are some minor differences between production and development deployments that will be pointed out in this document, but the actual services that run the code and handle the requests are identical between the two environments.
### Services
| Name | Details |
|:-----------:|:-----------------------------------------------------------------------------------------------------------:|
| nginx | listens on ports 80/443, handles HTTPS proxying, serves static assets, routes requests for daphne and uwsgi |
| uwsgi | listens on port 8050, handles API requests |
| daphne | listens on port 8051, handles Websocket requests |
| runworker | no listening port, watches and processes the message queue |
| supervisord | (production-only) handles the process management of all the services except nginx |
| `nginx` | listens on ports 80/443, handles HTTPS proxying, serves static assets, routes requests for `daphne` and `uwsgi` |
| `uwsgi` | listens on port 8050, handles API requests |
| `daphne` | listens on port 8051, handles websocket requests |
| `runworker` | no listening port, watches and processes the message queue |
| `supervisord` | (production-only) handles the process management of all the services except `nginx` |
When a request comes in to *nginx* and have the `Upgrade` header and is for the path `/websocket`, then *nginx* knows that it should
be routing that request to our *daphne* service.
When a request comes in to `nginx` and has the `Upgrade` header and is for the path `/websocket`, then `nginx` knows that it should be routing that request to our `daphne` service.
*daphne* receives the request and generates channel and routing information for the request. The configured event handlers for *daphne*
then unpack and parse the request message using the wire protocol mentioned above. This ensures that the connect has its context limited to only
receive messages for events it is interested in. *daphne* uses internal events to trigger further behavior, which will generate messages
and send them to the queue, that queue is processed by the *runworker*.
`daphne` receives the request and generates channel and routing information for the request. The configured event handlers for `daphne` then unpack and parse the request message using the wire protocol mentioned above. This ensures that the connection has its context limited to only receive messages for events it is interested in. `daphne` uses internal events to trigger further behavior, which will generate messages and send them to the queue, which is then processed by the `runworker`.
*runworker* processes the messages from the queue. This uses the contextual information of the message provided
by the *daphne* server and our *asgi_amqp* implementation to broadcast messages out to each client.
`runworker` processes the messages from the queue. This uses the contextual information of the message provided by the `daphne` server and our `asgi_amqp` implementation to broadcast messages out to each client.
### Development
- nginx listens on 8013/8043 instead of 80/443
- `nginx` listens on 8013/8043 instead of 80/443

View File

@ -1,25 +1,32 @@
## Tower Workflow Overview
Workflows are structured compositions of Tower job resources. The only job of a workflow is to trigger other jobs in specific orders to achieve certain goals, such as tracking the full set of jobs that were part of a release process as a single unit.
A workflow has an associated tree-graph that is composed of multiple nodes. Each node in the tree has one associated job template (job template, inventory update, project update, or workflow job template) along with related resources that, if defined, will override the associated job template resources (i.e. credential, inventory, etc.) if the job template associated with the node is chosen to run.
A workflow has an associated tree-graph that is composed of multiple nodes. Each node in the tree has one associated job template (job template, inventory update, project update, or workflow job template) along with related resources that, if defined, will override the associated job template resources (*i.e.*, credential, inventory, etc.) if the job template associated with the node is selected to run.
## Usage Manual
### Workflow Create-Read-Update-Delete (CRUD)
Like other job resources, workflow jobs are created from workflow job templates. The API exposes common fields similar to job templates, including labels, schedules, notification templates, extra variables and survey specifications. Other than that, in the API, the related workflow graph nodes can be gotten to via the related workflow_nodes field.
Like other job resources, workflow jobs are created from workflow job templates. The API exposes common fields similar to job templates, including labels, schedules, notification templates, extra variables and survey specifications. Other than that, in the API, the related workflow graph nodes can be accessed via the related `workflow_nodes` field.
The CRUD operations against a workflow job template and its corresponding workflow jobs are almost identical to those of normal job templates and related jobs.
By default, organization administrators have full control over all workflow job templates under the same organization, and they share these abilities with users who have the `workflow_admin_role` in that organization. Permissions can be further delegated to other users via the workflow job template roles.
### Workflow Nodes
Workflow Nodes are containers of workflow spawned job resources and function as nodes of workflow decision trees. Like that of workflow itself, the two types of workflow nodes are workflow job template nodes and workflow job nodes.
Workflow job template nodes are listed and created under endpoint `/workflow_job_templates/\d+/workflow_nodes/` to be associated with underlying workflow job template, or directly under endpoint `/workflow_job_template_nodes/`. The most important fields of a workflow job template node are `success_nodes`, `failure_nodes`, `always_nodes`, `unified_job_template` and `workflow_job_template`. The former three are lists of workflow job template nodes that, in union, forms the set of all its child nodes, in specific, `success_nodes` are triggered when parent node job succeeds, `failure_nodes` are triggered when parent node job fails, and `always_nodes` are triggered regardless of whether parent job succeeds or fails; The later two reference the job template resource it contains and workflow job template it belongs to.
### Workflow Nodes
Workflow Nodes are containers of workflow-spawned job resources and function as nodes of workflow decision trees. Like that of the workflow itself, the two types of workflow nodes are workflow job template nodes and workflow job nodes.
Workflow job template nodes are listed and created under the `/workflow_job_templates/\d+/workflow_nodes/` endpoint to be associated with the underlying workflow job template, or directly under endpoint `/workflow_job_template_nodes/`. The most important fields of a workflow job template node are `success_nodes`, `failure_nodes`, `always_nodes`, `unified_job_template` and `workflow_job_template`. The first three are lists of workflow job template nodes that, in union, forms the set of all of its child nodes; specifically, `success_nodes` are triggered when the parent node job succeeds, `failure_nodes` are triggered the when parent node job fails, and `always_nodes` are triggered regardless of whether the parent job succeeds or fails. The latter two fields reference the job template resource it contains and workflow job template it belongs to.
#### Workflow Launch Configuration
Workflow job templates can contain launch configuration items. So far, these only include
Workflow job templates can contain launch configuration items. So far, these only include:
- `extra_vars`
- `inventory`
- `limit`
@ -31,7 +38,7 @@ a survey, in the same way that job templates work.
Workflow nodes may also contain the launch-time configuration for the job it will spawn.
As such, they share all the properties common to all saved launch configurations.
When a workflow job template is launched a workflow job is created. If the workflow
When a workflow job template is launched, a workflow job is created. If the workflow
job template is set to prompt for a value, then the user may provide this on launch,
and the workflow job will assume the user-provided value.
@ -39,7 +46,7 @@ A workflow job node is created for each WFJT node and all fields from the WFJT n
If the workflow job and the node both specify the same prompt, then the workflow job
takes precedence and its value will be used. In either case, if the job template
the node references does not have the related prompting field set to true
the node references does not have the related prompting field set to `true`
(such as `ask_inventory_on_launch`), then the prompt will be ignored, and the
job template default, if it exists, will be used instead.
@ -47,10 +54,11 @@ See the document on saved launch configurations for how these are processed
when the job is launched, and the API validation involved in building
the launch configurations on workflow nodes.
#### Workflows as Workflow Nodes
A workflow can be added as a node in another workflow. The child workflow is the associated
`unified_job_template` that the node references, when that node is added to the parent workflow.
`unified_job_template` that the node references when that node is added to the parent workflow.
When the parent workflow dispatches that node, then the child workflow will begin running, and
the parent will resume execution of that branch when the child workflow finishes.
Branching into success / failed pathways is decided based on the status of the child workflow.
@ -59,6 +67,7 @@ In the event that spawning the workflow would result in recursion, the child wor
will be marked as failed with a message explaining that recursion was detected.
This is to prevent saturation of the task system with an infinite chain of workflows.
#### Workflow Approval Nodes
The workflow approval node feature enables users to add steps in a workflow in between nodes within workflows so that a user (as long as they have approval permissions, explained in further detail below) can give the "yes" or "no" to continue on to the next step in the workflow.
@ -86,20 +95,24 @@ A timeout (in minutes and seconds) can be set for each approval node. These fiel
### DAG Formation and Restrictions
The DAG structure of a workflow is enforced by associating workflow job template nodes via endpoints `/workflow_job_template_nodes/\d+/*_nodes/`, where `*` has options `success`, `failure` and `always`. There is one restriction that is enforced when setting up new connections and that is the cycle restriction, since it's a DAG.
The directed acyclic graph (DAG) structure of a workflow is enforced by associating workflow job template nodes via endpoints `/workflow_job_template_nodes/\d+/*_nodes/`, where `*` has options `success`, `failure` and `always`. There is one restriction that is enforced when setting up new connections and that is the cycle restriction, since it's a DAG.
### Workflow Run Details
A typical workflow run starts by either POSTing to endpoint `/workflow_job_templates/\d+/launch/`, or being triggered automatically by related schedule. At the very first, the workflow job template creates workflow job, and all related workflow job template nodes create workflow job nodes. Right after that, all root nodes are populated with corresponding job resources and start running. If nothing goes wrong, each decision tree will follow its own route to completion. The entire workflow finishes running when all its decision trees complete.
As stated, workflow job templates can be created with populated `extra_vars`. These `extra_vars` are combined with the `extra_vars` of any job template launched by the workflow with higher variable precedence, meaning they will overwrite job template variables with the same name. Note before the extra_vars set is applied as runtime job extra variables, it might be expanded and over-written by the cumulative job artifacts of ancestor nodes. The meaning of 'cumulative' here is children overwriting parent. For example, if a node has a parent node and a grandparent node, and both ancestors generate job artifacts, then the job artifacts of grandparent node is overwritten by that of parent node to form the set of cumulative job artifacts of the current node.
A typical workflow run starts by either POSTing to endpoint `/workflow_job_templates/\d+/launch/`, or being triggered automatically by related schedule. At the very first, the workflow job template creates a workflow job, and all related workflow job template nodes create workflow job nodes. Right after that, all root nodes are populated with corresponding job resources and start running. If nothing goes wrong, each decision tree will follow its own route to completion. The entire workflow finishes running when all of its decision trees complete.
Job resources spawned by workflow jobs are needed by workflow to run correctly. Therefore deletion of spawned job resources is blocked while the underlying workflow job is executing.
As stated, workflow job templates can be created with populated `extra_vars`. These `extra_vars` are combined with the `extra_vars` of any job template launched by the workflow with higher variable precedence, meaning they will overwrite job template variables with the same name. Note that before the `extra_vars` set is applied as runtime job extra variables, it might be expanded and over-written by the cumulative job artifacts of ancestor nodes. The meaning of 'cumulative' here is children overwriting parent. For example, if a node has a parent node and a grandparent node, and both ancestors generate job artifacts, then the job artifacts of grandparent node is overwritten by that of parent node to form the set of cumulative job artifacts of the current node.
Job resources spawned by workflow jobs are necessary for workflows to run correctly. Therefore, the deletion of spawned job resources is blocked while the underlying workflow job is executing.
Other than success and failure, a workflow spawned job resource can also end with status 'error' and 'canceled'. When a workflow spawned job resource errors or is canceled, it is treated the same as failure. If the unified job template of the node is null (which could be a result of deleting the unified job template or copying a workflow when the user lacks necessary permissions to use the resource), then the node will be treated as 'failed' and the failure paths will continue to execute.
A workflow job itself can also be canceled. In this case all its spawned job resources will be canceled if cancelable and following paths stop executing.
A workflow job itself can also be canceled. In this case all of its spawned job resources will be canceled (if cancellation is allowed) and its following paths stop executing.
Like job templates, workflow job templates can be associated with notification templates and notifications work exactly the same as that of job templates. One distinction is the notification message body. Workflow jobs send a notification body that contains not only the status of itself, but also status of all its spawned jobs. A typical notification body looks like this:
Like job templates, workflow job templates can be associated with notification templates and notifications work exactly the same as that of job templates. One distinction is the notification message body. Workflow jobs sends notification body that contains not only the status of itself, but also status of all its spawned jobs. A typical notification body looks like this:
```
Workflow job summary:
@ -109,22 +122,28 @@ Workflow job summary:
...
```
Starting from Tower 3.2, Workflow jobs support simultaneous job runs just like that of ordinary jobs. It is controlled by `allow_simultaneous` field of underlying workflow job template. By default, simultaneous workflow job runs are disabled and users should be prudent in enabling this functionality. Because the performance boost of simultaneous workflow runs will only manifest when a large portion of jobs contained by a workflow allow simultaneous runs. Otherwise it is expected to have some long-running workflow jobs since its spawned jobs can be in pending state for a long time.
Starting from Tower 3.2, Workflow jobs support simultaneous job runs just like that of ordinary jobs. It is controlled by the `allow_simultaneous` field of underlying workflow job template. By default, simultaneous workflow job runs are disabled and users should be prudent in enabling this functionality, since the performance boost of simultaneous workflow runs will only manifest when a large portion of jobs contained by a workflow allow simultaneous runs. Otherwise, it is expected to have some long-running workflow jobs since its spawned jobs can be in pending state for a long time.
A workflow job is marked as failed if a job spawned by a workflow job fails, without a failure handler. A failure handler is a `failure` or `always` link in the workflow job template. A job that is canceled is, effectively, considered a failure for the purposes of determining if a job nodes is failed.
A workflow job is marked as failed if a job spawned by a workflow job fails, without a failure handler. A failure handler is a failure or always link in the workflow job template. A job that is canceled is, effectively, considered a failure for purposes of determining if a job nodes is failed.
### Workflow Copy and Relaunch
Other than the normal way of creating workflow job templates, it is also possible to copy existing workflow job templates. The resulting new workflow job template will be mostly identical to the original, except for `name` field which will be appended a text to indicate it's a copy.
Workflow job templates can be copied by POSTing to endpoint `/workflow_job_templates/\d+/copy/`. After copy finished, the resulting new workflow job template will have identical fields including description, extra_vars, and survey-related fields (survey_spec and survey_enabled). More importantly, workflow job template node of the original workflow job template, as well as the topology they bear, will be copied. Note there are RBAC restrictions on copying workflow job template nodes. A workflow job template is allowed to be copied if the user has permission to add an equivalent workflow job template. If the user performing the copy does not have access to a node's related resources (job template, inventory, or credential), those related fields will be null in the copy's version of the node. Schedules and notification templates of the original workflow job template will not be copied nor shared, and the name of the created workflow job template is the original name plus a special-formatted suffix to indicate its copy origin as well as the copy time, such as 'copy_from_name@10:30:00 am'.
Other than the normal way of creating workflow job templates, it is also possible to copy existing workflow job templates. The resulting new workflow job template will be mostly identical to the original, except for the `name` field which will be appended in a way to indicate that it's a copy.
Workflow job templates can be copied by POSTing to endpoint `/workflow_job_templates/\d+/copy/`. After the copy finishes, the resulting new workflow job template will have identical fields including description, `extra_vars`, and survey-related fields (`survey_spec` and `survey_enabled`). More importantly, the workflow job template nodes of the original workflow job template, as well as the topology they bear, will be copied. Note that there are RBAC restrictions on copying workflow job template nodes. A workflow job template is allowed to be copied if the user has permission to add an equivalent workflow job template. If the user performing the copy does not have access to a node's related resources (job template, inventory, or credential), those related fields will be null in the copy's version of the node. Schedules and notification templates of the original workflow job template will not be copied nor shared, and the name of the created workflow job template is the original name plus a special-formatted suffix to indicate its copy origin as well as the copy time, such as `'copy_from_name@10:30:00 am'`.
Workflow jobs cannot be copied directly; instead, a workflow job is implicitly copied when it needs to relaunch. Relaunching an existing workflow job is done by POSTing to endpoint `/workflow_jobs/\d+/relaunch/`. What happens next is the original workflow job's prompts are re-applied to its workflow job template to create a new workflow job. Finally, the full-fledged new workflow job is triggered to run, thus fulfilling the purpose of relaunch. Survey password-type answers should also be redacted in the relaunched version of the workflow job.
Workflow jobs cannot be copied directly, instead a workflow job is implicitly copied when it needs to relaunch. Relaunching an existing workflow job is done by POSTing to endpoint `/workflow_jobs/\d+/relaunch/`. What happens next is the original workflow job's prompts are re-applied to its workflow job template to create a new workflow job. Finally the full-fledged new workflow job is triggered to run, thus fulfilling the purpose of relaunch. Survey password-type answers should also be redacted in the relaunched version of the workflow job.
### Artifacts
Artifact support starts in Ansible and is carried through in Tower. The `set_stats` module is invoked by users, in a playbook, to register facts. Facts are passed in via `data:` argument. Note that the default `set_stats` parameters are the correct ones to work with Tower (i.e. `per_host: no`). Now that facts are registered, we will describe how facts are used. In Ansible, registered facts are "returned" to the callback plugin(s) via the `playbook_on_stats` event. Ansible users can configure whether or not they want the facts displayed through the global `show_custom_stats` configuration. Note that the `show_custom_stats` does not effect the artifacting feature of Tower. This only controls the displaying of `set_stats` fact data in Ansible output (also the output in Ansible playbooks ran in Tower). Tower uses a custom callback plugin that gathers the fact data set via `set_stats` in the `playbook_on_stats` handler and "ships" it back to Tower, saves it in the database, and makes it available on the job endpoint via the variable `artifacts`. The semantics and usage of `artifacts` throughout a workflow is described elsewhere in this document.
Support for artifacts starts in Ansible and is carried through in Tower. The `set_stats` module is invoked by users, in a playbook, to register facts. Facts are passed in via the `data:` argument. Note that the default `set_stats` parameters are the correct ones to work with Tower (*i.e.*, `per_host: no`). Now that facts are registered, we will describe how facts are used. In Ansible, registered facts are "returned" to the callback plugin(s) via the `playbook_on_stats` event. Ansible users can configure whether or not they want the facts displayed through the global `show_custom_stats` configuration. Note that the `show_custom_stats` does not effect the artifact feature of Tower. This only controls the displaying of `set_stats` fact data in Ansible output (also the output in Ansible playbooks that get run in Tower). Tower uses a custom callback plugin that gathers the fact data set via `set_stats` in the `playbook_on_stats` handler and "ships" it back to Tower, saves it in the database, and makes it available on the job endpoint via the variable `artifacts`. The semantics and usage of `artifacts` throughout a workflow is described elsewhere in this document.
### Workflow Run Example
To best understand the nuances of workflow run logic we will look at an example workflow run as it progresses through the 'running' state. In the workflow examples below nodes are labeled `<do_not_run, job_status, node_id>` where `do_not_run` can be `RUN` or `DNR` where `DNR` means 'do not run the node' and `RUN` which means will run the node. Nodes start out with `do_not_run = False` depicted as `RUN` in the pictures below. When nodes are known to not run they will be marked `DNR` and the state will not change. `job_status` is the job's status associated with the node. `node_id` is the unique id for the workflow job node.
To best understand the nuances of workflow run logic, we will look at an example workflow run as it progresses through the 'running' state. In the workflow examples below, nodes are labeled `<do_not_run, job_status, node_id>` where `do_not_run` can be `RUN` or `DNR` where `DNR` means 'do not run the node' and `RUN` which means 'run the node'. Nodes start out with `do_not_run = False` depicted as `RUN` in the pictures below. When nodes are known to not run they will be marked `DNR` and the state will not change. `job_status` is the job's status associated with the node. `node_id` is the unique ID for the workflow job node.
<p align="center">
<img src="img/workflow_step0.png">
@ -132,60 +151,66 @@ To best understand the nuances of workflow run logic we will look at an example
</p>
<p align="center">
<img src="img/workflow_step1.png">
Root nodes are selected to run. A root node is a node with no incoming nodes. Node 0 is selected to run and results in a status of 'successful'. Nodes 1, 4, and 5 are marked 'DNR' because they are in the failure path. Node 6 is not marked 'DNR' because nodes 2 and 3 may run and result and node 6 running. The same reasoning is why nodes 7, 8, 9 are not marked 'DNR'.
Root nodes are selected to run. A root node is a node with no incoming nodes. Node 0 is selected to run and results in a status of `'successful'`. Nodes 1, 4, and 5 are marked `'DNR'` because they are in the failure path. Node 6 is not marked `'DNR'` because nodes 2 and 3 may run and result and node 6 running. The same reasoning is why nodes 7, 8, 9 are not marked `'DNR'`.
</p>
<p align="center">
<img src="img/workflow_step2.png">
Nodes 2 and 3 are selected to run and their job results are both 'successful'. Node 6 is not marked 'DNR' because node 3 will trigger node 6.
Nodes 2 and 3 are selected to run and their job results are both `'successful'`. Node 6 is not marked `'DNR'` because node 3 will trigger node 6.
</p>
<p align="center">
<img src="img/workflow_step3.png">
Node 6 is selected to run and the job results in 'failed'. Node 8 is marked 'DNR' because of the success path. Nodes 7 and 8 will be ran in the next cycle.
Node 6 is selected to run and the job results in `'failed'`. Node 8 is marked `'DNR'` because of the success path. Nodes 7 and 8 will be ran in the next cycle.
</p>
<p align="center">
<img src="img/workflow_step4.png">
Node 7 and 8 are selected to run and their job results are both 'successful'.
Node 7 and 8 are selected to run and their job results are both `'successful'`.
</p>
The resulting state of the workflow job run above would be 'successful'. Although individual nodes fail, the overall workflow job status is 'successful' because all individual node failures have error handling paths ('failed_nodes' or 'always_nodes').
The resulting state of the workflow job run above would be `'successful'`. Although individual nodes fail, the overall workflow job status is `'successful'` because all individual node failures have error handling paths (`'failed_nodes'` or `'always_nodes'`).
## Test Coverage
### CRUD-related
* Verify that CRUD operations on all workflow resources are working properly. Note workflow job nodes cannot be created or deleted independently, but verifications are needed to make sure when a workflow job is deleted, all its related workflow job nodes are deleted.
* Verify that CRUD operations on all workflow resources are working properly. Note that workflow job nodes cannot be created or deleted independently, but verifications are needed to ensure that when a workflow job is deleted, all its related workflow job nodes are also deleted.
* Verify the RBAC property of workflow resources. In specific:
* Workflow job templates can only be accessible by superusers ---- system admin, admin of the same organization and system auditor and auditor of the same organization with read permission only.
* Workflow job read and delete permissions follow from its associated workflow job template.
* Workflow job relaunch permission consists of the union of execute permission to its associated workflow job template, and the permission to re-create all the nodes inside of the workflow job.
* Workflow job template nodes rely their permission rules on the permission rules of both their associated workflow job template and unified job template for creation and editing.
* Workflow job template nodes rely on their permission rules for both their associated workflow job template and unified job template for creating and editing.
* Workflow job template nodes can be deleted with admin permission to their workflow job template (even lacking permission to the node's job template).
* Workflow job nodes are viewable if its workflow job is viewable.
* No CRUD actions are possible on workflow job nodes by any user, and they may only be deleted by deleting their workflow job.
* Workflow jobs can be deleted by superusers and org admins of the organization of its associated workflow job template, and no one else.
* Verify that workflow job template nodes can be created under, or (dis)associated with workflow job templates.
* Verify that the permitted types of job template types can be associated with a workflow job template node. Currently the permitted types are *job templates, inventory sources, projects, and workflow job templates*.
* Verify that workflow job template nodes under the same workflow job template can be associated to form parent-child relationship of decision trees. In specific, one node takes another as its child node by POSTing another node's id to one of the three endpoints: `/success_nodes/`, `/failure_nodes/` and `/always_nodes/`.
* Verify that workflow job template nodes are not allowed to have invalid association. Any attempt that causes invalidity will trigger 400-level response (i.e. cycles).
* Verify that a workflow job template can be successfully copied and the created workflow job template does not miss any field that should be copied or intentionally modified.
* Verify that workflow job template nodes under the same workflow job template can be associated to form a parent-child relationship of decision trees. In specific, one node takes another as its child node by POSTing another node's ID to one of the three endpoints: `/success_nodes/`, `/failure_nodes/` and `/always_nodes/`.
* Verify that workflow job template nodes are not allowed to have invalid association. Any attempt that causes invalidity will trigger 400-level response (*i.e.*, cycles).
* Verify that a workflow job template can be successfully copied, and that the created workflow job template does not miss any field that should be copied or intentionally modified.
* Verify that if a user has no access to any of the related resources of a workflow job template node, that node will not be copied and will have `null` as placeholder.
* Verify that `artifacts` is populated when `set_stats` is used in Ansible >= v2.2.1.0-0.3.rc3.
### Task-related
* Verify that workflow jobs can be launched by POSTing to endpoint `/workflow_job_templates/\d/launch/`.
* Verify that schedules can be successfully (dis)associated with a workflow job template, and workflow jobs can be triggered by the schedule of associated workflow job template at specified time point.
* Verify that extra variables work for workflow job templates as described. In specific, verify the role of workflow job extra variables as a set of global runtime variables over all its spawned jobs.
* Verify that extra variables of a workflow job node are correctly overwritten in order by the cumulative job artifacts of ancestors, and the overwrite policy of cumulative job artifacts is correct (artifacts of parent overwrite artifacts of grandparent).
* Verify that during a workflow job run, all its decision trees follow their correct paths of execution. Unwarranted behaviors include child node executing before its parent and wrong path being selected (*failure nodes* are executed when parent node *succeeds* and so on).
* Verify that during a workflow job run, all of its decision trees follow their correct paths of execution. Unwarranted behaviors include the child node executing before its parent and wrong path being selected (*failure nodes* are executed when parent node *succeeds* and so on).
* Verify that a subtree of execution will never start if its root node runs into internal error (*not ends with failure*).
* Verify that a subtree of execution will never start if its root node is successfully canceled.
* Verify that cancelling a workflow job that is cancellable will consequently cancel any of its cancellable spawned jobs and thus interrupts the whole workflow execution.
* Verify that during a workflow job run, deleting its spawned jobs are prohibited.
* Verify that at the beginning of each spawned job run, its prompted fields will be populated by the wrapping workflow job node with corrected values. For example, related `credentials` of workflow job node go to `credentials` of spawned job.
* Verify that notification templates can be successfully (dis)associated with a workflow job template. Later when its spawned workflow jobs finish running, verify that the correct type of notifications will be sent according to the job status.
* Verify that at the beginning of each spawned job run, its prompted fields will be populated by the wrapping workflow job node with corrected values. For example, related `credentials` of the workflow job node go to `credentials` of spawned job.
* Verify that notification templates can be successfully (dis)associated with a workflow job template. Later, when its spawned workflow jobs finishes running, verify that the correct type of notifications will be sent according to the job status.
* Verify that a workflow job can be successfully relaunched.
## Test Notes
* Please apply non-trivial topology when testing workflow run. A non-trivial topology for a workflow job template should include:
* Please apply non-trivial topology when testing a workflow run. A non-trivial topology for a workflow job template should include:
* Multiple decision trees.
* Relatively large hight in each decision tree.
* Relatively large height in each decision tree.
* All three types of relationships (`success`, `failure` and `always`).