When considering previous / current Project Updates we weren't
properly sorting the previous runs.
We also make sure we filter down to just "check" style project updates
and don't consider 'run' style standalone project updates when
deciding what are potentially related project updates
Instance has gone lost, and jobs are still either running
or waiting inside of its instance group
RBAC - user does not have permission to see some of the
groups that would be used in the capacity calculation
For some cases, a naive capacity dictionary is returned,
main goal is to not throw errors and avoid unpredicted behavior
Detailed capacity tests are moved into new unit test file.
The task manager was doing work to compute currently consumed
capacity, this is moved into the manager and applied in the
same form to the instance group list.
* Reap running tasks on non-netsplit nodes
* Reap running tasks on known to be offline nodes
* Reap waiting tasks with no celery id anywhere if waiting >= 60 seconds
* Workflow jobs are virtual jobs that don't actually run. Thus they
won't have a celery id and aren't candidates for the generic reaping.
* Better error logging when Instance not found in reaping code.
related to https://github.com/ansible/ansible-tower/issues/6570https://github.com/ansible/ansible-tower/issues/6489
* Ensure that all jobs dependent on a job have finished (i.e. error,
success, failed) before starting the dependent job.
* Fixes a bug where a smaller set of dependent jobs to fail upon a
update_on_launch job failing is chosen.
* This fixes the bug of jobs starting before dependent Project Updates
created via update on launch. This also fixes similar bugs associated
with inventory updates.
Running orphaned task cleanup within its own scheduled task via
celery-beat causes a race-y lock contention between the cleanup task and
the task scheduler. Unfortunately, the scheduler and the cleanup task
both run at similar intervals, so this race condition is fairly easy to
hit. At best, it results in situations where the scheduler is
regularly delayed 20s; depending on timing, this can cause situations
where task execution is needlessly delayed a minute+. At worst, it can
result in situations where the scheduler is never able to schedule
tasks.
This change implements the cleanup as a periodic block of code in the
scheduler itself that tracks its "last run" time in memcached (by
default, it performs a cleanup every 60 seconds)
see: #6534
* Do not "trust" the list of celery ids for database entries that were
modified after the list of celery ids was gotten.
* err on the side of caution and just let the next heartbeat celery
killer try killing the task if it needs to be reaped.
* Do not "trust" the list of celery ids for database entries that were
modified after the list of celery ids was gotten.
* err on the side of caution and just let the next heartbeat celery
killer try killing the task if it needs to be reaped.
* Prefetch all Jobs Types related instance group
* Prefetch inventory updates inventory source. The attribute
inventory_source.inventory_id is accessed when building the hash
tables of the running tasks.
* When generating dependencies (i.e. dynamically launching Project
Update and Inventory Update) only create the dynamic dependencies if
update_on_launch is True.
* Purging old task manager unit tests
* Migrating those tests to functional tests
* Updating fixtures and factories to support a change in the way the
task manager is tested
* Fix an issue with the mk_credential fixture when used in functional
tests. Previously it had trouble with multiplel invocations when
persistence was used
* New InstanceGroup model and associative relationship with Instances
* Associative instances between Organizations, Inventory, and Job
Templates and InstanceGroups
* Migrations for adding fields and tables for Instance Groups
* Adding activity stream reference for instance groups
* Task Manager Refactoring:
** Simplifying task manager relationships and move away from the
interstitial hash tables
** Simplify dependency determination logic
** Reduce task manager runtime complexity by removing the partial
references and moving the logic into the task manager directly or
relying on Job model logic for determinism
* Associate the celery_id with the job at the earliest point possible.
This ensures that a waiting job has a celery id. Thus, we are free to
fail waiting jobs that don't have a celery id.