* Instead of traversing the workflow graph to determine if a workflow is
done or has failed; instead, loop through all the nodes in the graph and
grab only the relevant nodes.
*clarify help text and squash migrations
*adds new internal_limit field to Job model for faster reference
*if field is non-blank, populate shard params in summary_fields
*add summary information to UI job/wfj details, JT selector
*allow sharding with prompts and schedules
*modify create_unified_job contract to pass class & parent_field name
*make parent field name instance method & set sharded UJT field
*access methods made compatible with job sharding
*move shard job special logic from task manager to workflows
*save sharded job prompts to workflow job exclusively
*allow using sharded jobs in workflows
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:
- a new decorator, `awx.main.dispatch.task`, which can be used to
decorate functions or classes so that they can be designated as
"Tasks"
- support for fanout/broadcast tasks (at this point in time, only
`conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
`handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
on demand
- minimal support for RPC, such as status checks and pool recycle/reload
* celery workers have internal queue names that are named after the
system hostname. This may differ from what tower knows the host by,
Instance.hostname
This adds a mapping so we can convert internal celery names to Instance
names for purposes of reaping jobs.
* Randomly chose an instance in the controller instance group for which
to control the isolated node run. Note the chosen instance via a job
controller_node field
* Deciding the Instance that a Job runs on at celery task run-time makes
it hard to evenly distribute tasks among Instnaces. Instead, the task
manager will look at the world of running jobs and choose an instance
node to run on; applying a deterministic job distribution algo.
This also relaxes some of the task manager rules on Instance Groups
down the full stack such that workflow jobs tend to shortcut the
processing or omit it altogether.
This lets the workflow job spawning logic exist outside of the
instance group queues, which it doesn't need to participate in in the
first place.
Consolidate prompts accept/reject logic in unified models
Break out accept/reject logic for variables
Surface new promptable fields on WFJT nodes, schedules
Make schedules and workflows accurately reject variables
that are not allowed by the prompting
rules or the survey rules on the template
Validate against unallowed extra_data in system job schedules
Prevent schedule or WFJT node POST/PATCH with unprompted data
Move system job days validation to new mechanism
Add new psuedo-field for WFJT node credential
Add validation for node related credentials
Add related config model to unified job
Use JobLaunchConfig model for launch RBAC check
Support credential overwrite behavior with multi-creds
change modern manual launch to use merge behavior
Refactor JobLaunchSerializer, self.instance=None
Modularize job launch view to create "modern" data
Auto-create config object with every job
Add create schedule endpoint for jobs
This keeps the instance from re-using a pool that might have already
expired and is unusable for the inspector that needs to run as part of
the task manager
Relates #264.
This PR proposed and implemented a way of defining workflow failure
state:
A workflow job fails if one of the conditions below satisfies.
* At least one node runs into states `canceled` or `error`.
* At least one leaf node runs into states `failed`, but no child node is
spawned to run (no error handler).
Signed-off-by: Aaron Tan <jangsutsr@gmail.com>
- prevent dual-entry for first item in running jobs due to
setdefault syntax
- fix issue where queues (celery tasks) only returned the last
item in the inspector output due to looping problem
this caused reaper bugs in production