Seth Foster
e09274e533
PR #8074 - limit how many jobs the task manager can start on a given run
2020-09-08 12:16:06 -04:00
Ryan Petrello
de59d1d3f6
improve job reaping for jobs that were started on a missing Instance
...
see: https://github.com/ansible/awx/issues/7848
2020-08-21 16:32:17 -04:00
Jeff Bradberry
ced8f42835
Force worker processes to have a different signal handler from the parent
...
Situations have come up where the 5+ minute kill signal for
run_task_manager is emitted to the worker process running it, but
since the worker improperly inherited the AWXConsumerBase().stop()
handler a deadlock ultimately was triggered on the database
connection.
2020-06-04 15:41:28 -04:00
Christian Adams
19ccb5e213
Mark job_explanation strings after they are read from the db
...
- For strings that need to be translated, but are saved in the db:
* They must be marked for translation using gettext_noop() to be
translated.
* And must also be marked for translation with _() when read from db
and shown to the user.
* [Ref]: https://docs.djangoproject.com/en/3.0/topics/i18n/translation/#marking-strings-as-no-op
2020-05-15 22:50:50 -04:00
beeankha
479ab8550d
Fix misc. linter errors
2020-05-14 15:43:50 -04:00
Christian Adams
a899a147e1
Fix new flake8 from pyflakes 2.2.0 release
2020-04-20 09:50:50 -04:00
chris meyers
dc6c353ecd
remove support for multi-reader dispatch queue
...
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
2020-03-18 16:10:16 -04:00
Bill Nottingham
e2fb83db98
Tweak workflow error message
2020-02-21 02:37:03 -05:00
beeankha
11ccfd8449
Fix misc. linting errors
2020-02-12 12:34:15 -05:00
Seth Foster
9b4b2167b3
TaskManager process dependencies only once
...
This adds a boolean "dependencies_processed" field to UnifiedJob
model. The default value is False. Once the task manager generates
dependencies for this task, it will not generate them again on
subsequent runs.
The changes also remove .process_dependencies(), as this method repeats
the same code as .process_pending_tasks(), and is not needed. Once
dependencies are generated, they are handled at .process_pending_tasks().
Adds a unit test that should catch regressions for this fix.
2020-02-06 11:47:33 -05:00
Rebeccah
63ae2cac38
Jake McDermott found some behavior that revealed a logical bug that would have caused issues later with ALL convergence nodes in sequential order via the API, although not the UI, and was causing existing issues with Root Nodes spawning repeatedly. To fix this I refactored the code from marking DNR nodes into it's own function that checks parents convergence criteria and leveraged that in bfs_nodes_to_run so that root nodes and convergence nodes can be differentiated but both can be correctly processed, also so that children of convergence nodes can be properly traversed by the function
2020-02-05 14:28:35 -05:00
Rebeccah
4e787cc079
made marking nodes as DNR more 'eager', added more unit tests, and added convergence check to bfs_nodes_to_run with new changes to the eagerness of DNR marking since it needs it to prevent convergence nodes from running too quickly
2020-02-05 14:28:35 -05:00
Rebeccah
a419547731
redid some formatting and syntax per personal preferences, comments on PR, and suggestions from @jrb
2020-02-05 14:28:35 -05:00
Rebeccah
6d2a2ab714
drastically improved performance by removing unnecessary iteration over children of parent nodes, additionally added an extra check that the node didn't already have a job so that it wasn't cycling over nodes that had already run when running through all_nodes
2020-02-05 14:28:35 -05:00
Rebeccah
82dd4a3884
remove node_object comparison and use the full dict to eliminate issues comparing obj and compare instead the whole node object with the node objects in the list
2020-02-05 14:28:35 -05:00
Rebeccah
86a39938fe
fixed issue where successful convergence wasn't being met due to the not quite correct leveraging of get_children
2020-02-05 14:28:35 -05:00
Rebeccah
70cf4cf5d4
added in handling for a parent being DNR so status is only checked if the parent isn't a DNR parent (in which case the parent has no status, which was breaking the logic) also edited a comment and added in a DNR check that @alancoding suggested to cut out duplicates in the DAG list
2020-02-05 14:28:34 -05:00
Rebeccah
f7f648b956
included all_parents_must_converge in the get_workflow_job_fieldnames so that the true/false is copied into the job node and not just in the template node. Also added in the migration for the DB, also relocated logic from bfs_nodes_to_run down into mark_dnr_nodes to prevent nodes not being marked as DNR but not being marked to run, causing them to run anyways
2020-02-05 14:28:34 -05:00
Rebeccah
780f104ab2
shifted from dependants/dependencies to children/parents for clarity in function names, also added in toggle logic
2020-02-05 14:28:34 -05:00
Rebeccah
4c35adad6c
added logic to include workflow convergence nodes to nodes to run or not run based on their parents successful statuses
2020-02-05 14:28:34 -05:00
Rebeccah
cf24c81b3e
updated syntax from python2 to 3
2020-02-05 14:28:34 -05:00
AlanCoding
1f46878652
Revert "Apply migration flag check to task manager"
...
This reverts commit a0910eb6de .
2020-01-02 09:08:17 -05:00
Seth Foster
b26b8e7097
Prevent running jobs from blocking inventory updates
...
A running job that has an inventory source will block
that inventory update from running. This fix removes
the block.
The test creates a job in running state, and an inventory
update in pending state. The test asserts that the
task manager and dependency graph .is_job_blocked method
returns False for the inventory update (i.e. update can
run).
issue #4809
2019-12-16 15:15:23 -05:00
Seth Foster
63e9aed601
Prevent running jobs from blocking project updates
...
A running job that has a project update will block
that update from running. This fix removes
the block.
Adds a functional test that sets up a job in "running" state, and
starts a project update that is in "pending" state. Assert that
the task manager and dependency graph .is_job_blocked methods both
return False.
issue #5153
2019-12-16 13:43:42 -05:00
AlanCoding
a0910eb6de
Apply migration flag check to task manager
2019-12-15 22:56:57 -05:00
Ryan Petrello
513f54a422
fix a few bugs related to container group execution
...
see: https://github.com/ansible/awx/issues/5326
2019-11-14 13:23:38 -05:00
Ryan Petrello
ccaaee61f0
improve cleanup of anonymous kubeconfig files
2019-10-29 11:24:12 -04:00
Ryan Petrello
7f1096f711
reap k8s-based jobs when the dispatcher restarts
2019-10-29 11:24:11 -04:00
Ryan Petrello
16812542f8
implement a simple periodic pod reaper for container groups
...
see: https://github.com/ansible/awx/issues/4911
2019-10-17 17:06:36 -04:00
Ryan Petrello
1cf02e1e17
properly set execution_node for project and inv updates run "in k8s"
...
see: https://github.com/ansible/awx/issues/4907
2019-10-17 15:15:24 -04:00
Shane McDonald
8f75382b81
Implement retry logic for container group pod launches
2019-10-10 15:53:56 -04:00
Shane McDonald
b93164e1ed
Prevent pods from failing if the reason is because of a resource quota
...
Signed-off-by: Shane McDonald <me@shanemcd.com >
2019-10-10 15:53:50 -04:00
Shane McDonald
bd5003ca98
Task manager / scheduler Kubernetes integration
2019-10-04 13:21:21 -04:00
beeankha
57fd6b7280
Set default messages for approval notifications
2019-09-27 15:48:00 -04:00
beeankha
13450fdbf9
Set up approval notifications to send
2019-09-27 15:48:00 -04:00
beeankha
6be2d84adb
Add endpoints for approval node notifications
...
...and also add a migration file.
2019-09-27 15:48:00 -04:00
beeankha
2fc7e93c6a
Emit websocket for approval node timeout
...
...and update timeout_message to be more translation-friendly.
2019-08-29 14:30:33 -04:00
beeankha
ea509f518e
Addressing comments, updating tests, etc.
2019-08-27 15:38:15 -04:00
beeankha
f6f6e5883a
Update websockets for pending approvals, change timeout expiration to
2019-08-27 15:36:27 -04:00
beeankha
d9f3fed06f
Update UJ/UJT endpoints, update approval RBAC, update approval timeout
2019-08-27 15:36:25 -04:00
beeankha
544a5063f3
Update timeout implementation, placeholder code for possible websocket support
2019-08-27 15:36:24 -04:00
beeankha
8c17990750
Activity stream and timeout
...
Update activity stream to show approval node info, add meaningful log
message for expired approval nodes in the Task Manager timeout
function.
2019-08-27 15:36:24 -04:00
Ryan Petrello
0522d45ab0
fixed a few issues related to approval role RBAC for normal users
2019-08-27 15:36:23 -04:00
beeankha
28289e85c1
Add timeout for workflow approval nodes
2019-08-27 15:36:22 -04:00
beeankha
5f82754a3f
Clean up RBAC code
2019-08-27 15:36:22 -04:00
beeankha
296b4e830b
Add more RBAC for approval nodes
2019-08-27 15:36:21 -04:00
beeankha
64c94d478d
Add more RBAC, filter out AJT/AJs from unified jobs lists
...
Comment out placeholder in serializer
2019-08-27 15:36:17 -04:00
beeankha
453e142635
Fix UJT-related error, add notification placeholders
2019-08-27 15:35:43 -04:00
beeankha
9cfed6f2a8
Add check for no-op case back, remove redundant on_commit code
2019-06-17 10:47:58 -04:00
beeankha
95896b1acd
Edit wfj running notification trigger
2019-06-17 10:47:58 -04:00