Commit Graph

315 Commits

Author SHA1 Message Date
Seth Foster
e09274e533 PR #8074 - limit how many jobs the task manager can start on a given run 2020-09-08 12:16:06 -04:00
Ryan Petrello
de59d1d3f6 improve job reaping for jobs that were started on a missing Instance
see: https://github.com/ansible/awx/issues/7848
2020-08-21 16:32:17 -04:00
Jeff Bradberry
ced8f42835 Force worker processes to have a different signal handler from the parent
Situations have come up where the 5+ minute kill signal for
run_task_manager is emitted to the worker process running it, but
since the worker improperly inherited the AWXConsumerBase().stop()
handler a deadlock ultimately was triggered on the database
connection.
2020-06-04 15:41:28 -04:00
Christian Adams
19ccb5e213 Mark job_explanation strings after they are read from the db
- For strings that need to be translated, but are saved in the db:
   * They must be marked for translation using gettext_noop() to be
   translated.
   * And must also be marked for translation with _() when read from db
   and shown to the user.
   * [Ref]: https://docs.djangoproject.com/en/3.0/topics/i18n/translation/#marking-strings-as-no-op
2020-05-15 22:50:50 -04:00
beeankha
479ab8550d Fix misc. linter errors 2020-05-14 15:43:50 -04:00
Christian Adams
a899a147e1 Fix new flake8 from pyflakes 2.2.0 release 2020-04-20 09:50:50 -04:00
chris meyers
dc6c353ecd remove support for multi-reader dispatch queue
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
2020-03-18 16:10:16 -04:00
Bill Nottingham
e2fb83db98 Tweak workflow error message 2020-02-21 02:37:03 -05:00
beeankha
11ccfd8449 Fix misc. linting errors 2020-02-12 12:34:15 -05:00
Seth Foster
9b4b2167b3 TaskManager process dependencies only once
This adds a boolean "dependencies_processed" field to UnifiedJob
model. The default value is False. Once the task manager generates
dependencies for this task, it will not generate them again on
subsequent runs.

The changes also remove .process_dependencies(), as this method repeats
the same code as .process_pending_tasks(), and is not needed. Once
dependencies are generated, they are handled at .process_pending_tasks().

Adds a unit test that should catch regressions for this fix.
2020-02-06 11:47:33 -05:00
Rebeccah
63ae2cac38 Jake McDermott found some behavior that revealed a logical bug that would have caused issues later with ALL convergence nodes in sequential order via the API, although not the UI, and was causing existing issues with Root Nodes spawning repeatedly. To fix this I refactored the code from marking DNR nodes into it's own function that checks parents convergence criteria and leveraged that in bfs_nodes_to_run so that root nodes and convergence nodes can be differentiated but both can be correctly processed, also so that children of convergence nodes can be properly traversed by the function 2020-02-05 14:28:35 -05:00
Rebeccah
4e787cc079 made marking nodes as DNR more 'eager', added more unit tests, and added convergence check to bfs_nodes_to_run with new changes to the eagerness of DNR marking since it needs it to prevent convergence nodes from running too quickly 2020-02-05 14:28:35 -05:00
Rebeccah
a419547731 redid some formatting and syntax per personal preferences, comments on PR, and suggestions from @jrb 2020-02-05 14:28:35 -05:00
Rebeccah
6d2a2ab714 drastically improved performance by removing unnecessary iteration over children of parent nodes, additionally added an extra check that the node didn't already have a job so that it wasn't cycling over nodes that had already run when running through all_nodes 2020-02-05 14:28:35 -05:00
Rebeccah
82dd4a3884 remove node_object comparison and use the full dict to eliminate issues comparing obj and compare instead the whole node object with the node objects in the list 2020-02-05 14:28:35 -05:00
Rebeccah
86a39938fe fixed issue where successful convergence wasn't being met due to the not quite correct leveraging of get_children 2020-02-05 14:28:35 -05:00
Rebeccah
70cf4cf5d4 added in handling for a parent being DNR so status is only checked if the parent isn't a DNR parent (in which case the parent has no status, which was breaking the logic) also edited a comment and added in a DNR check that @alancoding suggested to cut out duplicates in the DAG list 2020-02-05 14:28:34 -05:00
Rebeccah
f7f648b956 included all_parents_must_converge in the get_workflow_job_fieldnames so that the true/false is copied into the job node and not just in the template node. Also added in the migration for the DB, also relocated logic from bfs_nodes_to_run down into mark_dnr_nodes to prevent nodes not being marked as DNR but not being marked to run, causing them to run anyways 2020-02-05 14:28:34 -05:00
Rebeccah
780f104ab2 shifted from dependants/dependencies to children/parents for clarity in function names, also added in toggle logic 2020-02-05 14:28:34 -05:00
Rebeccah
4c35adad6c added logic to include workflow convergence nodes to nodes to run or not run based on their parents successful statuses 2020-02-05 14:28:34 -05:00
Rebeccah
cf24c81b3e updated syntax from python2 to 3 2020-02-05 14:28:34 -05:00
AlanCoding
1f46878652 Revert "Apply migration flag check to task manager"
This reverts commit a0910eb6de.
2020-01-02 09:08:17 -05:00
Seth Foster
b26b8e7097 Prevent running jobs from blocking inventory updates
A running job that has an inventory source will block
that inventory update from running. This fix removes
the block.

The test creates a job in running state, and an inventory
update in pending state. The test asserts that the
task manager and dependency graph .is_job_blocked method
returns False for the inventory update (i.e. update can
run).

issue #4809
2019-12-16 15:15:23 -05:00
Seth Foster
63e9aed601 Prevent running jobs from blocking project updates
A running job that has a project update will block
that update from running. This fix removes
the block.
Adds a functional test that sets up a job in "running" state, and
starts a project update that is in "pending" state. Assert that
the task manager and dependency graph .is_job_blocked methods both
return False.

issue #5153
2019-12-16 13:43:42 -05:00
AlanCoding
a0910eb6de Apply migration flag check to task manager 2019-12-15 22:56:57 -05:00
Ryan Petrello
513f54a422 fix a few bugs related to container group execution
see: https://github.com/ansible/awx/issues/5326
2019-11-14 13:23:38 -05:00
Ryan Petrello
ccaaee61f0 improve cleanup of anonymous kubeconfig files 2019-10-29 11:24:12 -04:00
Ryan Petrello
7f1096f711 reap k8s-based jobs when the dispatcher restarts 2019-10-29 11:24:11 -04:00
Ryan Petrello
16812542f8 implement a simple periodic pod reaper for container groups
see: https://github.com/ansible/awx/issues/4911
2019-10-17 17:06:36 -04:00
Ryan Petrello
1cf02e1e17 properly set execution_node for project and inv updates run "in k8s"
see: https://github.com/ansible/awx/issues/4907
2019-10-17 15:15:24 -04:00
Shane McDonald
8f75382b81 Implement retry logic for container group pod launches 2019-10-10 15:53:56 -04:00
Shane McDonald
b93164e1ed Prevent pods from failing if the reason is because of a resource quota
Signed-off-by: Shane McDonald <me@shanemcd.com>
2019-10-10 15:53:50 -04:00
Shane McDonald
bd5003ca98 Task manager / scheduler Kubernetes integration 2019-10-04 13:21:21 -04:00
beeankha
57fd6b7280 Set default messages for approval notifications 2019-09-27 15:48:00 -04:00
beeankha
13450fdbf9 Set up approval notifications to send 2019-09-27 15:48:00 -04:00
beeankha
6be2d84adb Add endpoints for approval node notifications
...and also add a migration file.
2019-09-27 15:48:00 -04:00
beeankha
2fc7e93c6a Emit websocket for approval node timeout
...and update timeout_message to be more translation-friendly.
2019-08-29 14:30:33 -04:00
beeankha
ea509f518e Addressing comments, updating tests, etc. 2019-08-27 15:38:15 -04:00
beeankha
f6f6e5883a Update websockets for pending approvals, change timeout expiration to 2019-08-27 15:36:27 -04:00
beeankha
d9f3fed06f Update UJ/UJT endpoints, update approval RBAC, update approval timeout 2019-08-27 15:36:25 -04:00
beeankha
544a5063f3 Update timeout implementation, placeholder code for possible websocket support 2019-08-27 15:36:24 -04:00
beeankha
8c17990750 Activity stream and timeout
Update activity stream to show approval node info, add meaningful log
message for expired approval nodes in the Task Manager timeout
function.
2019-08-27 15:36:24 -04:00
Ryan Petrello
0522d45ab0 fixed a few issues related to approval role RBAC for normal users 2019-08-27 15:36:23 -04:00
beeankha
28289e85c1 Add timeout for workflow approval nodes 2019-08-27 15:36:22 -04:00
beeankha
5f82754a3f Clean up RBAC code 2019-08-27 15:36:22 -04:00
beeankha
296b4e830b Add more RBAC for approval nodes 2019-08-27 15:36:21 -04:00
beeankha
64c94d478d Add more RBAC, filter out AJT/AJs from unified jobs lists
Comment out placeholder in serializer
2019-08-27 15:36:17 -04:00
beeankha
453e142635 Fix UJT-related error, add notification placeholders 2019-08-27 15:35:43 -04:00
beeankha
9cfed6f2a8 Add check for no-op case back, remove redundant on_commit code 2019-06-17 10:47:58 -04:00
beeankha
95896b1acd Edit wfj running notification trigger 2019-06-17 10:47:58 -04:00