Commit Graph

22 Commits

Author SHA1 Message Date
Alan Rominger
93c329d9d5 Fix cancel bug - WorkflowManager cancel in transaction (#14608)
This fixes a bug where jobs within a workflow job were not canceled
  when the workflow job was canceled by the user

The fix is to submit the cancel request as a part of the
  transaction that WorkflowManager commits its work in
  this requires that we send the message without expecting a reply
  so this changes the control-with-reply cancel to just a control function
2023-10-30 15:30:18 -04:00
Alan Rominger
284bd8377a Integrate scheduler into dispatcher main loop (#14067)
Dispatcher refactoring to get pg_notify publish payload
  as separate method

Refactor periodic module under dispatcher entirely
  Use real numbers for schedule reference time
  Run based on due_to_run method

Review comments about naming and code comments
2023-08-10 14:43:07 -04:00
Hao Liu
cd3f7666be add get_task_queuename
get_local_queuename will return the pod name of the instance

now that web and task are in different pods when web container queue a task it will be put into a queue without as task worker to execute the task
2023-03-29 22:09:19 -04:00
Alan Rominger
f5785976be Update to comply with new black rules 2023-02-01 14:59:38 -05:00
Alan Rominger
192f45bbd0 Make canceling view non-atomic to fix 500 errors with job bursts (#13072)
* Make canceling view non-atomic to fix 500 errors with job bursts

* Update test calls for cancel method changes
2022-10-20 15:02:54 -04:00
Alan Rominger
c59bbdecdb Refactor canceling to work through messaging and signals, not database
If canceled attempted before, still allow attempting another cancel
in this case, attempt to send the sigterm signal again.
Keep clicking, you might help!

Replace other cancel_callbacks with sigterm watcher
  adapt special inventory mechanism for this too

Get rid of the cancel_watcher method with exception in main thread

Handle academic case of sigterm race condition

Process cancelation as control signal

Fully connect cancel method and run_dispatcher to control

Never transition workflows directly to canceled, add logs
2022-09-01 15:20:31 -04:00
Alan Rominger
01037fa561 Fix sanity check to use the relevant active connection 2022-08-29 16:33:07 -04:00
Jeff Bradberry
b852baaa39 Fix up logger .warn() calls to use .warning() instead
This is a usage that was deprecated in Python 3.0.
2022-03-07 18:11:36 -05:00
Ryan Petrello
c2ef0a6500 move code linting to a stricter pep8-esque auto-formatting tool, black 2021-03-23 09:39:58 -04:00
Ryan Petrello
cd0b9de7b9 remove multiprocessing.Queue usage from the callback receiver
instead, just have each worker connect directly to redis
this has a few benefits:

- it's simpler to explain and debug
- back pressure on the queue keeps messages around in redis (which is
  observable, and survives the restart of Python processes)
- it's likely notably more performant at high loads
2020-09-24 13:53:58 -04:00
Ryan Petrello
57f8e48894 make --status more robust for dispatcher, and add support for receiver
make the --status flag work by fetching a periodically recorded snapshot
of internal process state; additionally, update the callback receiver to
*also* record these statistics so we can gain more insight into any
performance issues
2020-09-17 15:33:37 -04:00
Ryan Petrello
a0e5e74cab fix a typo in an f-string 2020-07-31 12:48:45 -04:00
chris meyers
5e481341bc flake8 2020-03-19 10:01:20 -04:00
chris meyers
7f2e1d46bc replace janky unique channel name w/ uuid
* postgres notify/listen channel names have size limitations as well as
character limitations. Respect those limitations while at the same time
generate a unique channel name.
2020-03-19 08:59:15 -04:00
chris meyers
12158bdcba remove dead code 2020-03-19 08:57:05 -04:00
chris meyers
093d204d19 fix flake8 2020-03-18 16:10:19 -04:00
chris meyers
be58906aed remove kombu 2020-03-18 16:10:17 -04:00
chris meyers
2a2c34f567 combine all the broker replacement pieces
* local redis for event processing
* postgres for message broker
* redis for websockets
2020-03-18 16:10:15 -04:00
chris meyers
c8eeacacca POC channels 2 2020-03-18 16:10:12 -04:00
Ryan Petrello
40b1e89b67 add the ability to disable RabbitMQ queue durability 2019-05-28 15:49:32 -04:00
Ryan Petrello
3be9113d6b fix a bug that breaks job cancel on single node jobs
1.  Install awx w/ a single node.
2.  Start a long-running job.
3.  Forcibly kill the `awx-manage run_dispatcher` process (e.g.,
    SIGKILL) and do not start it again.
4.  The job remains in running - without a second cluster to discover
    the job, it is never reaped.
5.  This PR allows you to cancel the job from the UI+API.
2018-10-19 09:10:33 -04:00
Ryan Petrello
ff1e8cc356 replace celery task decorators with a kombu-based publisher
this commit implements the bulk of `awx-manage run_dispatcher`, a new
command that binds to RabbitMQ via kombu and balances messages across
a pool of workers that are similar to celeryd workers in spirit.
Specifically, this includes:

- a new decorator, `awx.main.dispatch.task`, which can be used to
  decorate functions or classes so that they can be designated as
  "Tasks"
- support for fanout/broadcast tasks (at this point in time, only
  `conf.Setting` memcached flushes use this functionality)
- support for job reaping
- support for success/failure hooks for job runs (i.e.,
  `handle_work_success` and `handle_work_error`)
- support for auto scaling worker pool that scale processes up and down
  on demand
- minimal support for RPC, such as status checks and pool recycle/reload
2018-10-11 10:53:30 -04:00