Commit Graph

102 Commits

Author SHA1 Message Date
AlanCoding
d8d710a83d get rid of decorator dependency 2018-10-31 11:37:10 -04:00
Ryan Petrello
ff1e8cc356 replace celery task decorators with a kombu-based publisher
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:

- a new decorator, `awx.main.dispatch.task`, which can be used to
  decorate functions or classes so that they can be designated as
  "Tasks"
- support for fanout/broadcast tasks (at this point in time, only
  `conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
  `handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
  on demand
- minimal support for RPC, such as status checks and pool recycle/reload
2018-10-11 10:53:30 -04:00
Ryan Petrello
24f8cb49b5 don't access the database in our custom route_for_task
If database connectivity is lost/interrupted in this block of celery
internals, beat is *not* smart enough to recover, and it gets stuck in
an endless fail loop.  We don't _need_ to talk to the database here
anyways; just use settings.CLUSTER_HOST_ID to get what we need.

see: https://github.com/ansible/tower/issues/2957
2018-08-30 11:40:43 -04:00
Ryan Petrello
4eeb62766e apply sensitive field filtering to /api/v2/hosts/?host_filter
see: https://github.com/ansible/tower/issues/2874
see: https://github.com/ansible/tower/issues/2889
2018-08-21 08:17:14 -04:00
Ryan Petrello
3cdd0a94bd simplify dynamic queue binding
we recently made a change so that instances no longer bind to
instance-group specific queues, but now instead they each bind to
a direct queue for their specific hostname
(https://github.com/ansible/tower/pull/1922)

Because of this, we shouldn't *need* to reconfigure the queue binds at
runtime anymore when group membership changes. Under our new model,
every celeryd listens on a queue named after its hostname; when the
scheduler finds a task to run, it picks an Instance in the target
Instance Group and sends the task to the queue for that Instance's
hostname.
2018-07-28 00:48:37 -04:00
Ryan Petrello
6bd9792518 default HTTP-based log emits to HTTPS
see: https://github.com/ansible/awx/issues/2048
2018-07-09 11:52:04 -04:00
Ryan Petrello
eb9083c447 allow the bundled ansible virtualenv to be selected on JT/Proj
see: https://github.com/ansible/awx/issues/34
2018-06-18 14:33:09 -04:00
Yunfan Zhang
2a983e3dec Merge pull request #2152 from YunfanZhang42/host_filter_case_insensitive
Make search in Smart Filter case insensitive.
2018-06-13 16:21:00 -04:00
Yunfan Zhang
f332c0b8c3 Make search in host_filter case insensitive. 2018-06-13 14:16:48 -04:00
chris meyers
fb11967114 remove isolated instance group queue listening 2018-06-08 13:46:58 -04:00
chris meyers
1cea20092c remove rampart group queue subscription
* We now target Instances in the task manager when transitioning jobs
from pending to waiting; whereas before we submitted jobs to Instance
Groups to be picked up by Instance's in those Instance Groups.
Subscribing Instances to their Instance Groups is no longer needed. This
change removes the Instance Group queue subscription.
2018-06-08 11:20:54 -04:00
AlanCoding
015b19d8c3 interpret null protocol as logging.NullHandler 2018-05-30 15:21:51 -04:00
chris meyers
04767641af isolate cache 2018-05-17 12:58:11 -04:00
chris meyers
9f745dd3b8 control celery routes using celery router
* Each time a route is needed (i.e. when a task is sumitted to celery).
The router will be queried. This is ideal. With the previous method we
had to consider how a change in the routes would propogate to all celery
workers and nodes.

* fully describe the default awx queue
* Our dynamic queue registration would correct awx_private_queue.
However, we don't want celery to even create an "invalid"/extra
queue-exchange-route. This change makes sure we don't create extranious
things in rabbitmq.

* reduce the cluster queue registration output. Only output when the
queue registration list changes.
2018-05-02 12:57:36 -04:00
AlanCoding
ac20aa954a Replace logging-related restart with dynamic handler
refactor existing handlers to be the related
  "real" handler classes, which are swapped
  out dynamically by external logger "proxy" handler class

real handler swapout only done on setting change

remove restart_local_services method
get rid of uWSGI fifo file

change TCP/UDP return type contract so that it mirrors
  the request futures object
add details to socket error messages
2018-05-02 09:47:22 -04:00
chris meyers
648d9165ff broadcast queues get a per-node stable queue name
* Using Kombu's default Broadcast() constructor requires only 1
parameter. That parameter defines the exchange name and the queue name
is randomly generated per-node.
* This caused problems if/when celery enters an infinite restart loop
because too many rabbit queues get created and rabbit OOM's
(gracefully).
* To remedy this we tell Broadcast the queue name to use, which is
derived from some constant + the node name so that the per-node queue
name is stable.
2018-05-01 13:09:10 -04:00
Ryan Petrello
1eb5e98743 Merge branch 'release_3.2.4' into release_3.3.0 2018-04-26 11:10:28 -04:00
Wayne Witzel III
404b476576 Allow real null to be searched in host_filter 2018-04-25 11:46:21 -04:00
chris meyers
a56771c8f0 send all tower work to a user-hidden queue
* Before, we had a special group, tower, that ran any async work that
tower needed done. This allowed users fine grain control over which
nodes did background work. However, this granularity was too complicated
for users. So now, all tower system work goes to a special non-user
exposed celery queue. Tower remains the fallback instance group to
execute jobs on. The tower group will be created upon install and
protected from deletion.
2018-04-20 13:04:36 -04:00
Ryan Petrello
f8211b0588 add more edge case handling for yaml unsafe marking 2018-04-19 09:16:22 -04:00
AlanCoding
c397cacea5 add protection for job-compatible vars 2018-04-18 07:14:02 -04:00
Ryan Petrello
835f2eebc3 make extra var YAML serialization more robust to non-dict extra vars 2018-04-17 15:39:37 -04:00
Ryan Petrello
88c243c92a mark all unsafe launch-time extra vars as !unsafe
see: https://github.com/ansible/tower/issues/1338
see: https://bugzilla.redhat.com/show_bug.cgi?id=1565865
2018-04-16 16:47:44 -04:00
AlanCoding
8c167e50c9 Continuously stream data from verbose jobs
In verbose unified job models (inventory updates, system jobs,
etc.), do not delay dispatch just because the encoded
event data is not part of the data written to the buffer.

This allows output from these commands to be submitted
to the callback queue as they are produced, instead
of waiting until the buffer is closed.
2018-03-28 11:05:49 -04:00
Matthew Jones
acde2520d0 Sort cloud regions in a stable way
* All comes first
* Then US regions
* Then all other regions alphabetically
2018-03-13 15:31:28 -04:00
chris meyers
6606a29f57 celery 4.x -> 3.x change route config name 2018-02-27 14:13:05 -05:00
Matthew Jones
0d2daecf49 Merge pull request #1243 from matburt/fix_clustering_isolated
Fix isolated instance clustering implementation
2018-02-14 08:32:24 -05:00
Matthew Jones
925d9efecf Fixing up isolated node execution after cluster changes
* Rework queue detection to include control groups and isolated instances
* Fix up development tooling around isolated nodes
* Update unit tests
2018-02-13 21:51:38 -05:00
cclauss
2e623ad80c Change unicode() --> six.text_type() for Python 3 2018-02-11 21:09:12 +01:00
cclauss
c371b869dc basestring to six.string_types for Python 3 2018-02-09 16:28:36 +01:00
cclauss
e1a8b69736 from six.moves import xrange for Python 3 2018-02-08 22:41:33 +01:00
Matthew Jones
70bf78e29f Apply capacity algorithm changes
* This also adds fields to the instance view for tracking cpu and
  memory usage as well as information on what the capacity ranges are
* Also adds a flag for enabling/disabling instances which removes them
  from all queues and has them stop processing new work
* The capacity is now based almost exclusively on some value relative
  to forks
* capacity_adjustment allows you to commit an instance to a certain
  amount of forks, cpu focused or memory focused
* Each job run adds a single fork overhead (that's the reasoning
  behind the +1)
2018-02-01 16:57:09 -05:00
Matthew Jones
d9e774c4b6 Updates for automatic triggering of policies
* Switch policy router queue to not be "tower" so that we don't
  fall into a chicken/egg scenario
* Show fixed policy list in serializer so a user can determine if
  an instance is manually managed
* Change IG membership mixin to not directly handle applying topology
  changes. Instead it just makes sure the policy instance list is
  accurate
* Add create/delete hooks for instances and groups to trigger policy
  re-evaluation
* Update policy algorithm for fairer distribution
* Fix an issue where CELERY_ROUTES wasn't renamed after celery/django
  upgrade
* Update unit tests to be more explicit
* Update count calculations used by algorithm to only consider
  non-manual instances
* Adding unit tests and fixture
* Don't propagate logging messages from awx.main.tasks and
  awx.main.scheduler
* Use advisory lock to prevent policy eval conflicts
* Allow updating instance groups from view
2018-02-01 16:56:16 -05:00
Chris Meyers
c9ff3e99b8 celeryd attach to queues dynamically
* Based on the tower topology (Instance and InstanceGroup
relationships), have celery dyamically listen to queues on boot
* Add celery task capable of "refreshing" what queues each celeryd
worker listens to. This will be used to support changes in the topology.
* Cleaned up some celery task definitions.
* Converged wrongly targeted job launch/finish messages to 'tower'
queue, rather than a 1-off queue.
* Dynamically route celery tasks destined for the local node
* separate beat process

add support for separate beat process
2018-02-01 16:37:33 -05:00
Ryan Petrello
5387846cbb Merge pull request #992 from ryanpetrello/optimize-output-event-filter
optimize OutputEventFilter for large stdout streams
2018-01-17 14:24:15 -05:00
Chris Meyers
e33265e12c add job_id to fact cache log output 2018-01-17 10:19:27 -05:00
Ryan Petrello
51f7907a01 optimize OutputEventFilter for large stdout streams
update our event data search algorithm to be a bit lazier in event data
discovery; this drastically improves processing speeds for stdout >5MB

see: https://github.com/ansible/awx/issues/417
2018-01-16 14:41:35 -05:00
Ryan Petrello
1e8c89f536 implement support for per-playbook/project/org virtualenvs
see: https://github.com/ansible/awx/issues/34
2018-01-09 22:47:01 -05:00
Aaron Tan
6c2a7f3782 Merge pull request #906 from jangsutsr/refactor_named_url_tests
Refactor named URL unit tests
2018-01-04 14:20:27 -05:00
Aaron Tan
54bf7e13d8 Refactor named URL unit tests
The original tests set no longer works after Django 1.11 due to more
strict rules against dynamic model definition. The refactored tests set
aims at each existing model that apply named URL rules,  instead of
abstract general use cases, thus significantly improves maintainability
and readability.

Signed-off-by: Aaron Tan <jangsutsr@gmail.com>
2018-01-03 14:00:30 -05:00
Ryan Petrello
0b30e7907b change stdout composition to generate from job events on the fly
this approach totally removes the process of reading and writing stdout
files on the local file system at settings.JOBOUTPUT_ROOT when jobs are
run; now stdout content is only written on-demand as it's fetched for
the deprecated `stdout` endpoint

see: https://github.com/ansible/awx/issues/200
2018-01-03 09:09:43 -05:00
Matthew Jones
9dbcc5934e Merge remote-tracking branch 'tower/release_3.2.2' into devel 2017-12-13 12:25:47 -05:00
AlanCoding
dbd68c5747 encryption tests around the contract with survey functionality 2017-12-04 11:45:07 -05:00
Wayne Witzel III
f118e27047 Flake8 fixes and URL updates 2017-11-10 17:04:33 -05:00
Matthew Jones
5635f5fb49 Merge branch 'release_3.2.1' into devel
* release_3.2.1:
  fallback to empty dict when processing extra_data
  fix migration problem from 3.1.1
  move 0005a migration to 0005b
  feedback on ad hoc prohibited vars error msg
  Fix the way we include i18n files in sdist
  Fix migrations to support 3.1.2 -> 3.2.1+ upgrade path
  fix missing parameter to update_capacity method
  fix WARNING log when launching ad hoc command
  Validate against ansible variables on ad hoc launch
  do not allow ansible connection type of local for ad_hoc
  work around an ansible 2.4 inventory caching bug
  fix scan job migration unicode issue
  Assert isolated nodes have capacity set to 0 and restored based on version
  Set capacity to zero if the isolated node has an old version
2017-10-19 13:30:26 -04:00
Alan Rominger
353a9a55c7 Merge pull request #406 from AlanCoding/variables_debt
Consolidation of variables parsing throughout codebase
2017-10-13 15:40:36 -04:00
AlanCoding
993fa9290d additional verbosity for vars parsing exceptions 2017-10-13 11:41:11 -04:00
AlanCoding
f03b40aa50 enforce max line length of 160 characters 2017-10-11 12:38:39 -04:00
AlanCoding
eacbeef660 Validate against ansible variables on ad hoc launch
Share code between this check for ad hoc and JT callback
2017-10-05 12:14:05 -04:00
Chris Meyers
26d393e5c2 2-level memoize
* Allows for invalidating an entire function from the memoizer
2017-09-21 15:34:51 -04:00