keep pre-upgrade events in an old table (instead of a partition)
- instead of creating a default partition, keep all events in special
"unpartitioned" tables
- track these tables via distinct proxy=true models
- when generating the queryset for a UnifiedJob's events, look at the
creation date of the job; if it's before the date of the migration,
query on the old unpartitioned table, otherwise use the more modern table
that provides auto-partitioning
I was trying to parse the difference between this and the
(directly above) org_active_count from the comment, and then I
grepped and realized this function appears unused.
There is some history here.
https://github.com/ansible/awx/pull/7190 <- This PR was an attempt at fixing a
bug notting ran into where some jobs on k8s installs would get stuck in Waiting
forever.
The PR mentioned above introduced a bug where there are no instance groups on a
fresh k8s-based install. This is because this process currently happens in the
launch scripts, before the database is up.
With this patch, queue / instance group registration happens in the heartbeat,
right after auto-registering the instance.
With AWX_AUTO_DEPROVISION_INSTANCES on, instances
are registered with an ip address. However, new
instances might try to register before old instances
are deprivisioned. In this case old IPs can conflict with
the new ones. This will check for an ip conflict and unset
the IP of conflicting instance (set to None)
ansible/awx issue 6750
* The heartbeat of an instance is determined to be the last modified
time of the Instance object. Therefore, we want to be careful to only
update very specific fields of the Instance object.
if the first health check fails due to AMQP or celery issues, the
capacity will stay at the default of 100 (which is confusing)
see: https://github.com/ansible/tower/issues/2085
This also relaxes some of the task manager rules on Instance Groups
down the full stack such that workflow jobs tend to shortcut the
processing or omit it altogether.
This lets the workflow job spawning logic exist outside of the
instance group queues, which it doesn't need to participate in in the
first place.
* Was considering an isolated instance: any instance that has at least 1
group with no controller. This is technically correct since an iso node
can not be a part of a non-iso group.
* The query is now more robust and considers a node an iso node if ALL
groups that a node belong to ALL have a controller.
* Also added better debugging for the special tower instance group
* Added a check for the existance of the special tower group so that
logs are less "messy" during the install process.
* Nodes are marked offline, then deleted; given enough time. Nodes can
come back for various reasions (i.e. netsplit). When they come back,
have them recreate the node Instance if AWX_AUTO_DEPROVISION_INSTANCES
is True. Otherwise, do nothing. The do nothing case will show up in the
logs as celery job tracebacks as they fail to be self aware.
* This also adds fields to the instance view for tracking cpu and
memory usage as well as information on what the capacity ranges are
* Also adds a flag for enabling/disabling instances which removes them
from all queues and has them stop processing new work
* The capacity is now based almost exclusively on some value relative
to forks
* capacity_adjustment allows you to commit an instance to a certain
amount of forks, cpu focused or memory focused
* Each job run adds a single fork overhead (that's the reasoning
behind the +1)
* Associating/Disassociating an instance with a group
* Triggering a topology rebuild on that change
* Force rabbitmq cleanup of offline nodes
* Automatically check for dependent service startup
* Fetch and set hostname for celery so it doesn't clobber other
celeries
* Rely on celery init signals to dyanmically set listen queues
* Removing old total_capacity instance manager property
* For Tower the license must match between the source and destination
* For AWX the check is disabled
* Hosts imported from another Tower don't count against your license
in the local Tower
* Fix up some issues with enablement
* Prevent slashes from being used in the instance filter
* Add &all=1 filter to make sure we pick up all hosts
Instance has gone lost, and jobs are still either running
or waiting inside of its instance group
RBAC - user does not have permission to see some of the
groups that would be used in the capacity calculation
For some cases, a naive capacity dictionary is returned,
main goal is to not throw errors and avoid unpredicted behavior
Detailed capacity tests are moved into new unit test file.
The task manager was doing work to compute currently consumed
capacity, this is moved into the manager and applied in the
same form to the instance group list.
* filter_host_string now returns a query set instead of Q(). This change
updates the Host manager to remove the .filter() call since the results
returned from filter_host_string() has already turned Q() -> Query Set