Commit Graph

62 Commits

Author SHA1 Message Date
Ryan Petrello
73baf3fcf9 defer loading Job.artifacts on host views to improve performance
see: https://github.com/ansible/awx/issues/8006

this data can be *really* large, and we don't actually need it for the summary fields on this API endpoint
2020-08-26 13:34:15 -04:00
Bill Nottingham
a33c303765 Remove active_counts_by_org
I was trying to parse the difference between this and the
(directly above) org_active_count from the comment, and then I
grepped and realized this function appears unused.
2020-06-10 15:25:14 -04:00
Shane McDonald
91dbc2de30 Add queue / instance group registration to heartbeat for k8s installs
There is some history here.

https://github.com/ansible/awx/pull/7190 <- This PR was an attempt at fixing a
bug notting ran into where some jobs on k8s installs would get stuck in Waiting
forever.

The PR mentioned above introduced a bug where there are no instance groups on a
fresh k8s-based install. This is because this process currently happens in the
launch scripts, before the database is up.

With this patch, queue / instance group registration happens in the heartbeat,
right after auto-registering the instance.
2020-06-06 08:58:35 -04:00
Seth Foster
6652464e25 Unset old instance IP when conflicting new instance IP
With AWX_AUTO_DEPROVISION_INSTANCES on, instances
are registered with an ip address. However, new
instances might try to register before old instances
are deprivisioned. In this case old IPs can conflict with
the new ones. This will check for an ip conflict and unset
the IP of conflicting instance (set to None)

ansible/awx issue 6750
2020-04-28 10:52:15 -04:00
chris meyers
0a1070834d only update the ip address field on the instance
* The heartbeat of an instance is determined to be the last modified
time of the Instance object. Therefore, we want to be careful to only
update very specific fields of the Instance object.
2020-03-19 10:01:20 -04:00
Shane McDonald
45ce6d794e Initial migration of rabbitmq -> redis for k8s installs 2020-03-18 16:10:17 -04:00
Nikhil Jain
9eecb24c32 fix smart inventory duplicate hosts 2020-02-28 09:46:44 +05:30
Shane McDonald
bd5003ca98 Task manager / scheduler Kubernetes integration 2019-10-04 13:21:21 -04:00
Jeff Bradberry
6399ec59c9 Include in the API the count of hosts used by an organization 2019-02-28 15:54:09 -05:00
Ryan Petrello
2927803a82 fix overindent lint failures 2019-01-30 12:12:39 -05:00
Ryan Petrello
195aff37ad default instance capacity to zero at registration/insertion time
if the first health check fails due to AMQP or celery issues, the
capacity will stay at the default of 100 (which is confusing)

see: https://github.com/ansible/tower/issues/2085
2018-06-07 11:42:31 -04:00
chris meyers
97ab6449b9 parallelize test running 2018-05-16 14:29:15 -04:00
Matthew Jones
4af8a53232 Remove Instance Group concept/usage from WorkflowJobs
This also relaxes some of the task manager rules on Instance Groups
down the full stack such that workflow jobs tend to shortcut the
processing or omit it altogether.

This lets the workflow job spawning logic exist outside of the
instance group queues, which it doesn't need to participate in in the
first place.
2018-04-25 08:29:49 -04:00
chris meyers
c3100afd0e fixed isolated instance query
* Was considering an isolated instance: any instance that has at least 1
group with no controller. This is technically correct since an iso node
can not be a part of a non-iso group.
* The query is now more robust and considers a node an iso node if ALL
groups that a node belong to ALL have a controller.
* Also added better debugging for the special tower instance group
* Added a check for the existance of the special tower group so that
logs are less "messy" during the install process.
2018-04-03 13:50:57 -04:00
chris meyers
838b723c73 add all instances to special tower instance group
* All instances except isolated instances
* Also, prevent any tower attributes from being modified via the API
2018-03-29 16:47:52 -04:00
chris meyers
eef6f7ecb0 delay looking up settings SYSTEM_UUID 2018-03-28 09:54:51 -04:00
chris meyers
7ce8907b7b reregister node when they come back online
* Nodes are marked offline, then deleted; given enough time. Nodes can
come back for various reasions (i.e. netsplit). When they come back,
have them recreate the node Instance if AWX_AUTO_DEPROVISION_INSTANCES
is True. Otherwise, do nothing. The do nothing case will show up in the
logs as celery job tracebacks as they fail to be self aware.
2018-03-27 14:30:47 -04:00
Matthew Jones
70bf78e29f Apply capacity algorithm changes
* This also adds fields to the instance view for tracking cpu and
  memory usage as well as information on what the capacity ranges are
* Also adds a flag for enabling/disabling instances which removes them
  from all queues and has them stop processing new work
* The capacity is now based almost exclusively on some value relative
  to forks
* capacity_adjustment allows you to commit an instance to a certain
  amount of forks, cpu focused or memory focused
* Each job run adds a single fork overhead (that's the reasoning
  behind the +1)
2018-02-01 16:57:09 -05:00
Matthew Jones
624289bed7 Add support for directly managing instance groups
* Associating/Disassociating an instance with a group
* Triggering a topology rebuild on that change
* Force rabbitmq cleanup of offline nodes
* Automatically check for dependent service startup
* Fetch and set hostname for celery so it doesn't clobber other
  celeries
* Rely on celery init signals to dyanmically set listen queues
* Removing old total_capacity instance manager property
2018-02-01 16:46:44 -05:00
AlanCoding
98f8faa349 simplify query for active_count 2017-12-14 09:53:26 -05:00
Matthew Jones
5f3ebc26e0 Adding license checks for Tower inventory source
* For Tower the license must match between the source and destination
* For AWX the check is disabled
* Hosts imported from another Tower don't count against your license
  in the local Tower
* Fix up some issues with enablement
* Prevent slashes from being used in the instance filter
* Add &all=1 filter to make sure we pick up all hosts
2017-10-27 08:12:14 -04:00
AlanCoding
ff96a750e1 backend of org-scoped smart inventories 2017-08-29 16:43:56 -04:00
AlanCoding
d54eb93f26 Handle capacity algorithm corner cases
Instance has gone lost, and jobs are still either running
or waiting inside of its instance group
RBAC - user does not have permission to see some of the
groups that would be used in the capacity calculation

For some cases, a naive capacity dictionary is returned,
main goal is to not throw errors and avoid unpredicted behavior

Detailed capacity tests are moved into new unit test file.
2017-08-28 16:12:12 -04:00
AlanCoding
5327a4c622 Use global capacity algorithm in serializer
The task manager was doing work to compute currently consumed
capacity, this is moved into the manager and applied in the
same form to the instance group list.
2017-08-28 12:07:47 -04:00
AlanCoding
ce3c969c08 correct capacity algorithm for task manager 2017-08-26 11:59:25 -04:00
Wayne Witzel III
f8c2b466a8 sometimes core_filters is not an attribute, so just set it to empty instead of pop 2017-08-21 06:03:37 -04:00
Wayne Witzel III
eb6a27653f Adjust HostManager and update summary host query 2017-08-21 04:01:58 -04:00
Wayne Witzel III
c352ea7596 Update HostManager to return only a single matching hostname for SmartInventory filter 2017-08-21 02:17:27 -04:00
Wayne Witzel III
d652ed16d0 Dynamic -> Smart Inventory 2017-05-17 16:25:40 -04:00
Wayne Witzel III
477956ec30 Add distinct to HostManager for DynamicFilter results 2017-05-15 12:20:17 -04:00
Chris Meyers
cc4692932a update Host manager to handle Query Sets
* filter_host_string now returns a query set instead of Q(). This change
updates the Host manager to remove the .filter() call since the results
returned from filter_host_string() has already turned Q() -> Query Set
2017-05-12 12:14:32 -04:00
Wayne Witzel III
e46a043213 Revert isinstance, circular imports due to when the HostManager exists for an Inventory 2017-05-02 13:16:53 -04:00
Wayne Witzel III
af35838aff Make kind read-only for PUT/PATCH, use isinstance in Host Manager, update field fasly check 2017-05-02 13:00:17 -04:00
Wayne Witzel III
1750e5bd2a Refactor Host manager and dynamic Inventory tests and update
validation/serialization
2017-05-01 13:06:49 -04:00
Wayne Witzel III
8a599d9754 Add Inventory.kind field 2017-05-01 12:55:42 -04:00
Wayne Witzel III
a45d41b379 DynamicFilterQuerySet -> DynamicFilter 2017-05-01 12:55:42 -04:00
Wayne Witzel III
17e9b3057e Clean-up intiail commit for Host filter / DynamicInventory 2017-05-01 12:55:42 -04:00
Aaron Tan
9e4655419e Fix flake8 E302 errors. 2016-11-15 20:59:39 -05:00
Matthew Jones
30984a3a79 Fix up flake8 errors 2016-11-10 13:07:48 -05:00
Matthew Jones
78b8876ed9 Support expiring of capacity if a node is down
For a certain amount of time
2016-11-10 09:52:04 -05:00
Matthew Jones
343966f744 Implement gathering overall task capacity
For use when running/planning jobs
2016-11-07 13:45:01 -05:00
Matthew Jones
3d8eb48986 Test for HA license before allowing cluster job start 2016-10-20 16:50:49 -04:00
Matthew Jones
b906469b40 Add execution node information 2016-10-20 13:27:36 -04:00
Chris Meyers
203df91a5d more robust test mode checking 2016-10-03 09:28:44 -04:00
Chris Church
b7a6aa01a3 Fixes to get flake8 and unit/functional tests passing. 2016-09-18 19:11:29 -04:00
Matthew Jones
f3a8eb9daf Merge pull request #3509 from ansible/ha_installer
Improvements to the setup/installer to support new HA workflows
2016-09-16 15:53:38 -04:00
Matthew Jones
a4ec0739ea Temporarily disable instance id gathering
During requests
2016-09-16 15:21:42 -04:00
Matthew Jones
3de4aae548 Fixing up HA induced flake8 issues 2016-09-15 13:51:17 -04:00
Matthew Jones
0c1e1fa2fb Refactor Tower HA Instance logic and models
* Gut the HA middleware
* Purge concept of primary and secondary.
* UUID is not the primary host identifier, now it's based mostly on the
  username.  Some work probably still left to do to make sure this is
  legit.  Also removed unique constraint from the uuid field.  This
  might become the cluster ident now... or it may just deprecate
* No more secondary -> primary redirection
* Initial revision of /api/v1/ping
* Revise and gut tower-manage register_instance
* Rename awx/main/socket.py to awx/main/socket_queue.py to prevent
  conflict with the "socket" module from python base
* Revist/gut the Instance manager... not sure if this manager is really
  needed anymore
2016-09-08 13:37:53 -04:00
Akita Noek
6ea99583da Mass active flag code removal 2016-03-15 09:29:55 -04:00