External-Mirrors/awx

mirror of https://github.com/ansible/awx.git synced 2026-02-15 18:20:00 -03:30

Author	SHA1	Message	Date
Alan Rominger	97ce7d226b	Error handling when node is missing from mesh for jobs and checks	2021-10-19 10:06:02 -04:00
chris meyers	098201be0b	avoid work_results and work release race * Unsure exactly why this happens but there seems to be a race condition related to the time window between Receptor work_results and work release. This sleep extends that window and hopefully avoids the race condition.	2021-10-15 16:27:24 -04:00
Bianca Henderson	0e15c269cd	Merge pull request #5351 from ansible/exec_node_error_handling [4.1] Improve Error Handling for When Job Cannot Be Delivered to an Execution Node	2021-10-15 13:29:27 -04:00
Alan Rominger	54e3377254	Respect settings to keep files and work units Add new logic to cleanup orphaned work units from administrative tasks Remove noisy log which is often irrelevant about running-cleanup-on-execution-nodes we already have other logs for this	2021-10-15 11:36:36 -04:00
Bianca Henderson	dd622dcc30	Change log level from 'warning' to 'exception'	2021-10-15 08:08:24 -04:00
Bianca Henderson	412bf7f282	Move error handling into try/catch block	2021-10-14 14:52:38 -04:00
chris meyers	611a537b55	add missing create partition for scm backed inv * This will resolve missing project update job events issue	2021-10-13 07:51:40 -04:00
chris meyers	64811d0b6b	fix python black lint requirements	2021-10-12 17:09:30 -04:00
Amol Gautam	f79a57c3e2	Changed Work Submission parameter for K8s work	2021-10-11 08:10:26 -07:00
Alan Rominger	b70793db5c	Consolidate cleanup actions under new `ansible-runner worker cleanup` command (#11160 ) * Primary development of integrating runner cleanup command * Fixup image cleanup signals and their tests * Use alphabetical sort to solve the cluster coordination problem * Update test to new pattern * Clarity edits to interface with ansible-runner cleanup method * Another change corresponding to ansible-runner CLI updates * Fix incomplete implementation of receptor remote cleanup * Share receptor utils code between worker_info and cleanup * Complete task logging from calling runner cleanup command * Wrap up unit tests and some contract changes that fall out of those * Fix bug in CLI construction * Fix queryset filter bug	2021-10-05 16:32:03 -04:00
Amol Gautam	24a6edef9e	AWX dev environment changes for receptor work signing feature -- Updated devel build to take most recent receptor binary -- Added signWork parameter when sedning job to receptor -- Modified docker-compose tasks to generate RSA key pair to use for work-signing -- Modified docker-compose templates and jinja templates for implementing work-sign -- Modified Firewall rules on the receptor jinja config Add firewall rules to dev env	2021-10-05 11:41:34 -07:00
Alan Rominger	af5f8e8a4a	Always set project sync execution_node to current host (#11204 )	2021-10-05 13:08:40 -04:00
Jim Ladd	815a45cf2f	call 'work cancel' on inactive controller jobs	2021-10-01 12:55:06 -07:00
Alan Rominger	7c9626b0e7	Fix bug that would run --worker-info health checks on control or hybrid nodes (#11161 ) * Fix bug that would run health check on control nodes * Prevent running execution node health check against main cluster nodes	2021-09-29 09:34:22 -04:00
Alan Rominger	3664cc3369	Disable autodiscovery except for docker-compose (#11142 )	2021-09-27 11:36:11 -04:00
Marcelo Moreira de Mello	6d4b4cac37	Project updates must run on controller nodes For project updates jobs triggered due a job template run, we must ensure that project_update job to run on at the same controller which dispatched the original job template, otherwise the job might fail for being unable to find the playbook YAML file.	2021-09-25 23:05:45 -04:00
Marcelo Moreira de Mello	045785c36f	Refactored get_conn_type() method to use Enum	2021-09-23 10:51:50 -04:00
Marcelo Moreira de Mello	45600d034d	Initial StreamTLS support for receptor nodes	2021-09-23 10:50:17 -04:00
Alan Rominger	e914c23c42	Pass --delete flag to worker for execution node cleanup (#11078 ) * Pass --delete flag to worker for execution node cleanup * Remove the pdd_wrapper_ directory	2021-09-16 11:21:41 -04:00
beeankha	48eb06f320	Add verify_ssl to container_auth_data params	2021-09-16 09:49:53 -04:00
beeankha	ac8b49b39d	Change the way auth info is passed to Runner for EEs pulled from protected registries	2021-09-15 08:49:28 -04:00
Alan Rominger	6a17e5b65b	Allow manually running a health check, and make other adjustments to the health check trigger (#11002 ) * Full finalize the planned work for health checks of execution nodes * Implementation of instance health_check endpoint * Also do version conditional to node_type * Do not use receptor mesh to check main cluster nodes health * Fix bugs from testing health check of cluster nodes, add doc * Add a few fields to health check serializer missed before * Light refactoring of error field processing * Fix errors clearing error, write more unit tests * Update health check info in docs * Bump migration of health check after rebase * Mark string for translation * Add related health_check link for system auditors too * Handle health_check cluster node timeout, add errors for peer judgement	2021-09-03 16:37:37 -04:00
Alan Rominger	5a6e9a06e2	Exclude control-only nodes from IG policy calculations (#10985 ) * Exclude control-only nodes from IG policy calculations Also, as a reverse to this, exclude execution-only nodes from the calculations if the group in question is the controlplane * Incorporate review comments	2021-09-01 08:09:46 -04:00
Alan Rominger	dc4b014d12	Make status command in error handling cleaner (#10823 )	2021-08-31 12:02:39 -04:00
Jim Ladd	467a37f8fe	use settings.DEFAULT_EXECUTION_QUEUE_NAME in lieu of default	2021-08-26 11:15:14 -07:00
Jim Ladd	88a6412b54	only need to update IG's policy_instance_list field	2021-08-26 11:15:14 -07:00
Jim Ladd	502eaf9fb9	handle case where default IG does not exist * also, only add discovered execution node to default group if `register`-ing the node actually resulted in a confirmed change	2021-08-26 11:15:14 -07:00
Jim Ladd	de8eab0434	inspect_execution_nodes should not block when retreiving lock * would otherwise hold up cluster heartbeat task * furthermore, only really need one node to run through `inspect_execution_nodes` each interval	2021-08-26 11:15:14 -07:00
Jim Ladd	f317fca9e4	add auto-discovered nodes to default IG * add advisory_lock to avoid IG update race logic * update IG by way of policy_instance_list	2021-08-26 11:15:14 -07:00
Jim Ladd	561fc289fb	disable discovered instances by default	2021-08-26 11:15:14 -07:00
Alan Rominger	daf4310176	Clean up work_type processing and fix execution vs control capacity (#10930 ) * Clean up added work_type processing for mesh_code branch * track both execution and control capacity * Remove unused execution_capacity property * Count all forms of capacity to make test pass * Force jobs to be on execution nodes, updates on control nodes * Introduce capacity_type property to abstract some details out * Update test to cover all job types at same time * Register OpenShift nodes as control types * Remove unqualified consumed_capacity from task manager and make unit tests work * Remove unqualified consumed_capacity from task manager and make unit tests work * Update unit test to execution vs control TM logic changes * Fix bug, else handling for work_type method	2021-08-26 07:24:14 -04:00
Shane McDonald	274e487a96	Attempt to surface streaming errors that were being eaten (#10918 )	2021-08-24 10:33:00 -04:00
Alan Rominger	940c189c12	Corresponding AWX changes for runner --worker-info schema update (#10926 )	2021-08-24 08:41:36 -04:00
Alan Rominger	928c35ede5	Model changes for instance last_seen field to replace modified (#10870 ) * Model changes for instance last_seen field to replace modified * Break up refresh_capacity into smaller units * Rename execution node methods, fix last_seen clustering * Use update_fields to make it clear save only affects capacity * Restructing to pass unit tests * Fix bug where a PATCH did not update capacity value	2021-08-24 08:41:35 -04:00
Alan Rominger	3b1e40d227	Use the ansible-runner worker --worker-info to perform execution node capacity checks (#10825 ) * Introduce utilities for --worker-info health check integration * Handle case where ansible-runner is not installed * Add ttl parameter for health check * Reformulate return data structure and add lots of error cases * Move up the cleanup tasks, close sockets * Integrate new --worker-info into the execution node capacity check * Undo the raw value override from the PoC * Additional refinement to execution node check frequency * Put in more complete network diagram * Followup on comment to remove modified from from health check responsibilities	2021-08-24 08:41:35 -04:00
Alan Rominger	4e84c7c4c4	Use the existing get_receptor_ctl method (#10813 )	2021-08-24 08:41:35 -04:00
Alan Rominger	f47eb126e2	Adopt the node_type field in receptor logic (#10802 ) * Adopt the node_type field in receptor logic * Refactor Instance.objects.register so we do not reset capacity to 0	2021-08-24 08:41:34 -04:00
Alan Rominger	9881bb72b8	Treat the awx_1 node as a hybrid node for now, use local work type (#10726 )	2021-08-24 08:40:21 -04:00
Alan Rominger	f597205fa7	Run capacity checks with container isolation (#10688 ) This requires swapping out the container images for the execution nodes from awx-ee to the awx image For completeness, the hop node image is switched to the raw receptor image A few outright bugs are fixed here memory calculation just was not right at all the execution_capacity calculation was reverse of intention Drop in a few TODOs about error handling from debugging	2021-08-24 08:40:19 -04:00
Alan Rominger	e7be86867d	Fix rebase bug specific to ad hoc commands	2021-08-24 08:40:19 -04:00
Alan Rominger	13300bdbd4	Update rebase to keep old control plane capacity check Also do some basic work to separate control versus execution capacity this is to assure that we don't send jobs to the control node	2021-08-24 08:40:19 -04:00
Alan Rominger	39e23db523	Make minor changes to add needed imports	2021-08-24 08:40:19 -04:00
Ryan Petrello	05cb876df5	implement an initial development environment for receptor-based clusters	2021-08-24 08:40:18 -04:00
softwarefactory-project-zuul[bot]	68e309ee32	Merge pull request #10607 from AlanCoding/unused_exception Remove unused exception about custom venvs random cleanup Reviewed-by: Bianca Henderson <beeankha@gmail.com>	2021-07-09 15:43:00 +00:00
Alan Rominger	17f9b57028	Remove unused exception about custom venvs	2021-07-07 11:38:37 -04:00
Alan Rominger	e96080a512	No result_traceback is blank, not null	2021-07-07 11:37:30 -04:00
Alan Rominger	f126a6343b	Fix bug setting execution_node to null (not blank) (#5169 )	2021-06-28 10:51:06 -04:00
Shane McDonald	1ed170fff0	Dont overwrite result_traceback if it was already set.	2021-06-28 10:51:06 -04:00
Alan Rominger	390e1f9a0a	Fix obvious logical bug with project folder pre-creation (#5155 )	2021-06-28 10:51:04 -04:00
Shane McDonald	397908543d	Disable activity stream for updates in status handler	2021-06-28 10:51:04 -04:00

1 2 3 4 5 ...

1258 Commits