Commit Graph

43 Commits

Author SHA1 Message Date
chris meyers
b94cf379f6 do not choose offline instances 2018-06-04 10:06:59 -04:00
chris meyers
e720fe5dd0 decide the node a job will run early
* Deciding the Instance that a Job runs on at celery task run-time makes
it hard to evenly distribute tasks among Instnaces. Instead, the task
manager will look at the world of running jobs and choose an instance
node to run on; applying a deterministic job distribution algo.
2018-06-04 10:06:59 -04:00
Ryan Petrello
4abac44411 prevent unicode in instance hostnames and instance group names
see: https://github.com/ansible/tower/issues/1721
2018-05-18 16:28:43 -04:00
Matthew Jones
05419d010b Update group cluster policies on save, not just created
Currently updating policy settings doesn't trigger a re-evaluation of
instance group policies, this makes sure we re-evaluate in the event
that anything changes.
2018-04-24 21:40:11 -04:00
chris meyers
c3100afd0e fixed isolated instance query
* Was considering an isolated instance: any instance that has at least 1
group with no controller. This is technically correct since an iso node
can not be a part of a non-iso group.
* The query is now more robust and considers a node an iso node if ALL
groups that a node belong to ALL have a controller.
* Also added better debugging for the special tower instance group
* Added a check for the existance of the special tower group so that
logs are less "messy" during the install process.
2018-04-03 13:50:57 -04:00
Chris Meyers
47fa99d3ad Merge pull request #1154 from chrismeyersfsu/enhancement-tower_in_all_groups
add all instances to special tower instance group
2018-04-02 09:39:04 -04:00
chris meyers
838b723c73 add all instances to special tower instance group
* All instances except isolated instances
* Also, prevent any tower attributes from being modified via the API
2018-03-29 16:47:52 -04:00
chris meyers
8438331563 make jobs_running more rich in OPTIONS
* Expose jobs_running as an IntegerField
2018-03-28 16:01:24 -04:00
AlanCoding
7881c921ac block deletion of resources w unprocessed events 2018-03-16 10:14:28 -04:00
chris meyers
5d5d8152c5 prevent instance group delete if running jobs
* related to https://github.com/ansible/ansible-tower/issues/7936
2018-03-15 14:25:49 -04:00
Matthew Jones
70bf78e29f Apply capacity algorithm changes
* This also adds fields to the instance view for tracking cpu and
  memory usage as well as information on what the capacity ranges are
* Also adds a flag for enabling/disabling instances which removes them
  from all queues and has them stop processing new work
* The capacity is now based almost exclusively on some value relative
  to forks
* capacity_adjustment allows you to commit an instance to a certain
  amount of forks, cpu focused or memory focused
* Each job run adds a single fork overhead (that's the reasoning
  behind the +1)
2018-02-01 16:57:09 -05:00
Matthew Jones
6e9930a45f Use on_commit hook for triggering ig policy
* also Apply console handlers to loggers for dev environment
2018-02-01 16:56:43 -05:00
Matthew Jones
d9e774c4b6 Updates for automatic triggering of policies
* Switch policy router queue to not be "tower" so that we don't
  fall into a chicken/egg scenario
* Show fixed policy list in serializer so a user can determine if
  an instance is manually managed
* Change IG membership mixin to not directly handle applying topology
  changes. Instead it just makes sure the policy instance list is
  accurate
* Add create/delete hooks for instances and groups to trigger policy
  re-evaluation
* Update policy algorithm for fairer distribution
* Fix an issue where CELERY_ROUTES wasn't renamed after celery/django
  upgrade
* Update unit tests to be more explicit
* Update count calculations used by algorithm to only consider
  non-manual instances
* Adding unit tests and fixture
* Don't propagate logging messages from awx.main.tasks and
  awx.main.scheduler
* Use advisory lock to prevent policy eval conflicts
* Allow updating instance groups from view
2018-02-01 16:56:16 -05:00
Matthew Jones
56abfa732e Adding initial instance group policies
and policy evaluation planner
2018-02-01 16:56:15 -05:00
Chris Meyers
c9ff3e99b8 celeryd attach to queues dynamically
* Based on the tower topology (Instance and InstanceGroup
relationships), have celery dyamically listen to queues on boot
* Add celery task capable of "refreshing" what queues each celeryd
worker listens to. This will be used to support changes in the topology.
* Cleaned up some celery task definitions.
* Converged wrongly targeted job launch/finish messages to 'tower'
queue, rather than a 1-off queue.
* Dynamically route celery tasks destined for the local node
* separate beat process

add support for separate beat process
2018-02-01 16:37:33 -05:00
AlanCoding
5327a4c622 Use global capacity algorithm in serializer
The task manager was doing work to compute currently consumed
capacity, this is moved into the manager and applied in the
same form to the instance group list.
2017-08-28 12:07:47 -04:00
AlanCoding
1112557c79 set capacity to 0 if instance has not checked in lately 2017-07-27 16:20:04 -04:00
Matthew Jones
c7a85d9738 Mass rename from ansible_(awx|tower) -> (awx|tower) 2017-07-26 13:33:26 -04:00
AlanCoding
dd1a261bc3 setup playbook and heartbeat for isolated deployments
* Allow isolated_group_ use in setup playbook
* Tweaks to host/queue registration commands complementing setup
* Create isolated heartbeat task and check capacity
* Add content about isolated instances to acceptance docs
2017-06-19 12:13:36 -04:00
Ryan Petrello
422950f45d Support for executing job and adhoc commands on isolated Tower nodes (#6524) 2017-06-14 11:47:30 -04:00
Aaron Tan
604243428c Add URL and type fields to instances/instance groups. 2017-05-30 17:00:27 -04:00
Matthew Jones
705f8af440 Update views and serializers to support instance group (ramparts)
* includes top level views for instances and instance groups and
  extending those views to be able to view running jobs
* Associative endpoints on Organizations, Inventories, and Job
  Templates
* Related and summary field entries where appropriate
* Adding job model references to executing instance group
* Fix up default queue properties for clustering from the settings file
* Update production and default settings for instance queues in settings
2017-05-10 12:33:03 -04:00
Matthew Jones
4ced911c00 Implementing models for instance groups, updating task manager
* New InstanceGroup model and associative relationship with Instances
* Associative instances between Organizations, Inventory, and Job
  Templates and InstanceGroups
* Migrations for adding fields and tables for Instance Groups
* Adding activity stream reference for instance groups
* Task Manager Refactoring:
** Simplifying task manager relationships and move away from the
   interstitial hash tables
** Simplify dependency determination logic
** Reduce task manager runtime complexity by removing the partial
   references and moving the logic into the task manager directly or
   relying on Job model logic for determinism
2017-05-10 12:32:54 -04:00
Matthew Jones
ea8b78ca49 Protect cluster nodes after an upgrade
* Modify instance model to container a version number for the node
* Update that version number during the heartbeat
* If during a heartbeat any of the nodes are of a newer version then
  shutdown the current node.

The idea behind this is that if all nodes were upgraded at the same
time then at the moment of the healthcheck they should all be at the
newer version. Otherwise we put the system in a state where it can
receive the upgrade but stay down until that happens. During setup
playbook run the services will be fully restarted.
2017-04-10 15:37:33 -04:00
Aaron Tan
9e4655419e Fix flake8 E302 errors. 2016-11-15 20:59:39 -05:00
Matthew Jones
343966f744 Implement gathering overall task capacity
For use when running/planning jobs
2016-11-07 13:45:01 -05:00
Chris Meyers
25b85c4a0b rename scheduler config singleton 2016-11-01 14:07:00 -05:00
Chris Meyers
13c89ab78c HAify job schedules and more task_manager renaming 2016-11-01 13:50:42 -05:00
Matthew Jones
3de4aae548 Fixing up HA induced flake8 issues 2016-09-15 13:51:17 -04:00
Matthew Jones
0c1e1fa2fb Refactor Tower HA Instance logic and models
* Gut the HA middleware
* Purge concept of primary and secondary.
* UUID is not the primary host identifier, now it's based mostly on the
  username.  Some work probably still left to do to make sure this is
  legit.  Also removed unique constraint from the uuid field.  This
  might become the cluster ident now... or it may just deprecate
* No more secondary -> primary redirection
* Initial revision of /api/v1/ping
* Revise and gut tower-manage register_instance
* Rename awx/main/socket.py to awx/main/socket_queue.py to prevent
  conflict with the "socket" module from python base
* Revist/gut the Instance manager... not sure if this manager is really
  needed anymore
2016-09-08 13:37:53 -04:00
John Mitchell
32d1c0e4db fixed copyright date 2015-06-11 16:10:23 -04:00
Matthew Jones
31d0342d41 More copyright headers for api side stuff 2015-05-29 12:10:40 -04:00
Matthew Jones
b3da3b34a3 Changing some legal headers for python source files 2015-05-29 12:10:39 -04:00
Luke Sneeringer
d6699353e5 Save hostnames, not IP addresses, for HA. 2014-12-02 10:34:25 -06:00
Luke Sneeringer
ec1c770099 Added job cancelation when switching to secondary. 2014-10-20 08:04:17 -05:00
Luke Sneeringer
e23801313e Do a OneToOne with UnifiedJob. 2014-10-20 08:04:17 -05:00
Luke Sneeringer
f6a501bb28 Register signal against subclasses separately. 2014-10-20 08:04:17 -05:00
Luke Sneeringer
a72e72f20f Adding a post_save receiver. 2014-10-20 08:04:17 -05:00
Luke Sneeringer
ec7aa1867f Adding JobOrigin model and migration. 2014-10-20 08:04:16 -05:00
Luke Sneeringer
60dae748dc Adding IP Address to the instance.
Needed for redirecting.
2014-10-20 08:04:16 -05:00
Luke Sneeringer
a9260db790 Adding an instance manager. 2014-10-20 08:04:16 -05:00
Luke Sneeringer
35c6c72f2e Add Instance.role property. 2014-10-20 08:04:15 -05:00
Luke Sneeringer
072ab2118f Instance model (for HA) 2014-10-20 08:04:15 -05:00