implement https://github.com/ansible/awx/issues/12446
in development environment, enable set of views that run
the task manager(s).
Also introduce a setting that disables any calls to schedule()
that do not originate from the debug views when in the development
environment. With guards around both if we are in the development
environment and the setting, I think we're pretty safe this won't get
triggered unintentionally.
use MODE to determine if we are in devel env
Also, move test for skipping task managers to the tasks file
- DependencyManager spawns dependencies if necessary
- WorkflowManager processes running workflows to see if a new job is
ready to spawn
- TaskManager starts tasks if unblocked and has execution capacity
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:
- a new decorator, `awx.main.dispatch.task`, which can be used to
decorate functions or classes so that they can be designated as
"Tasks"
- support for fanout/broadcast tasks (at this point in time, only
`conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
`handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
on demand
- minimal support for RPC, such as status checks and pool recycle/reload
* Based on the tower topology (Instance and InstanceGroup
relationships), have celery dyamically listen to queues on boot
* Add celery task capable of "refreshing" what queues each celeryd
worker listens to. This will be used to support changes in the topology.
* Cleaned up some celery task definitions.
* Converged wrongly targeted job launch/finish messages to 'tower'
queue, rather than a 1-off queue.
* Dynamically route celery tasks destined for the local node
* separate beat process
add support for separate beat process
The task manager was doing work to compute currently consumed
capacity, this is moved into the manager and applied in the
same form to the instance group list.
Running orphaned task cleanup within its own scheduled task via
celery-beat causes a race-y lock contention between the cleanup task and
the task scheduler. Unfortunately, the scheduler and the cleanup task
both run at similar intervals, so this race condition is fairly easy to
hit. At best, it results in situations where the scheduler is
regularly delayed 20s; depending on timing, this can cause situations
where task execution is needlessly delayed a minute+. At worst, it can
result in situations where the scheduler is never able to schedule
tasks.
This change implements the cleanup as a periodic block of code in the
scheduler itself that tracks its "last run" time in memcached (by
default, it performs a cleanup every 60 seconds)
see: #6534
* Do not "trust" the list of celery ids for database entries that were
modified after the list of celery ids was gotten.
* err on the side of caution and just let the next heartbeat celery
killer try killing the task if it needs to be reaped.
* Do not "trust" the list of celery ids for database entries that were
modified after the list of celery ids was gotten.
* err on the side of caution and just let the next heartbeat celery
killer try killing the task if it needs to be reaped.