Commit Graph

7879 Commits

Author SHA1 Message Date
Satoe Imaishi
f3f781917a Skip pbr license check if ansible-runner isn't a released version 2021-09-28 11:07:30 -04:00
Marcelo Moreira de Mello
270f6c4abd Merge pull request #11143 from tchellomello/fix_streamtls_when_not_present
Fixed logic to avoid tracebacks when node_name is invalid
2021-09-27 11:49:11 -04:00
Alan Rominger
3664cc3369 Disable autodiscovery except for docker-compose (#11142) 2021-09-27 11:36:11 -04:00
Marcelo Moreira de Mello
2204e03123 Fixed logic to avoid tracebacks when node_name is invalid 2021-09-27 11:28:28 -04:00
Marcelo Moreira de Mello
6d4b4cac37 Project updates must run on controller nodes
For project updates jobs triggered due a job template run,
we must ensure that project_update job to run on at the same
controller which dispatched the original job template, otherwise
the job might fail for being unable to find the playbook YAML file.
2021-09-25 23:05:45 -04:00
Alan Rominger
3fc63489f1 Filter controller_node selection to online nodes (#11120) 2021-09-24 23:01:32 -04:00
Marcelo Moreira de Mello
471f47cd9e Merge pull request #11093 from ansible/receptor_control_service_tls
Introduce the control-service TLS support on receptor
2021-09-24 12:26:19 -04:00
Rebeccah Hunter
e5dbb592fa Merge pull request #11074 from rebeccahhh/no_duplicate_uuids
prevent duplicate UUIDs from being created and allow users to update hostnames based on uuid
2021-09-24 11:52:55 -04:00
Shane McDonald
c4d8485c81 Update license test to work with http(s) urls in requirements files 2021-09-24 10:16:11 -04:00
Marcelo Moreira de Mello
045785c36f Refactored get_conn_type() method to use Enum 2021-09-23 10:51:50 -04:00
Marcelo Moreira de Mello
45600d034d Initial StreamTLS support for receptor nodes 2021-09-23 10:50:17 -04:00
Alan Rominger
62e9e7ea80 Avoid setting controller_node to an execution node for container jobs (#11117) 2021-09-23 09:16:10 -04:00
Rebeccah
a9f4011a45 defensive code for getting instance added, also simplified nested if
statements, rewrote some comments add a logger warning that the instance is being grabbed by the hostname and not the UUID
2021-09-21 16:54:11 -04:00
Rebeccah
55f2125a51 if the user provides a uuid and it exists, allow that to tie to the instance, which allows the user to update the instance based on the UUID (includeding updating the hostname) should they choose to do so. 2021-09-21 16:54:11 -04:00
Alan Rominger
1319fadc60 Fix overwrite bug where hosts with no instance ID var are re-created (#10910)
* Write tests to assure air-tightness of overwrite

* Candidate fix for group overwrite air-tightness

* Another proposed fix for the id mapping

* Further double down on tracking old instance_id

* Separate unchanging data case and fix some test issues

* parametrize final confusion test

* cut down some more on test cases and fix bug with prior fix

* Rewrite of _delete_host code sharing with update method

This is a start-to-finish rewrite of the host overwrite bug fix
this method is much more conservative,
it does this by keeping the overall code structure where hosts
are deleted before host updates are made

To fix the bug, we share code between the method that deletes hosts
and the method that updates the hosts
A data structure is created and passed to both methods

By having both methods use the same data structure which maps
the in-memory hosts to DB hosts, we assure that the deletions
will ONLY delete hosts that will not be updated
2021-09-16 15:29:57 -04:00
Alan Rominger
e914c23c42 Pass --delete flag to worker for execution node cleanup (#11078)
* Pass --delete flag to worker for execution node cleanup

* Remove the pdd_wrapper_ directory
2021-09-16 11:21:41 -04:00
beeankha
48eb06f320 Add verify_ssl to container_auth_data params 2021-09-16 09:49:53 -04:00
beeankha
ac8b49b39d Change the way auth info is passed to Runner for EEs pulled from protected registries 2021-09-15 08:49:28 -04:00
Jim Ladd
1b50db26b6 Explicitly pass in UUID to get_or_register
Co-authored-by: Alan Rominger <arominge@redhat.com>
2021-09-14 10:58:29 -07:00
Alan Rominger
46ac9506e6 Assure consistent ordering with default IG first (#11034)
* Assure consistent ordering with default IG first

* Write conditional a little more defensively to pass tests
2021-09-08 11:11:46 -04:00
Alan Rominger
6a17e5b65b Allow manually running a health check, and make other adjustments to the health check trigger (#11002)
* Full finalize the planned work for health checks of execution nodes

* Implementation of instance health_check endpoint

* Also do version conditional to node_type

* Do not use receptor mesh to check main cluster nodes health

* Fix bugs from testing health check of cluster nodes, add doc

* Add a few fields to health check serializer missed before

* Light refactoring of error field processing

* Fix errors clearing error, write more unit tests

* Update health check info in docs

* Bump migration of health check after rebase

* Mark string for translation

* Add related health_check link for system auditors too

* Handle health_check cluster node timeout, add errors for peer judgement
2021-09-03 16:37:37 -04:00
Jim Ladd
f39834ad82 pass uuid to Instance.create 2021-09-03 10:05:15 -07:00
Jim Ladd
bdb13343bb remove unused import 2021-09-03 10:05:15 -07:00
Jim Ladd
262cd3c695 set default uuid 2021-09-03 10:05:15 -07:00
Jim Ladd
f02099e8b7 provision_instance should create new uuid if needed
.. instead of default to current system's UUID

related: #10990
2021-09-03 10:05:15 -07:00
Jeff Bradberry
81fe39f060 Merge pull request #10929 from ansible/validate-control-only-nodes
Validate that control-only Instance nodes cannot change IG membership
2021-09-01 09:24:40 -04:00
Alan Rominger
5a6e9a06e2 Exclude control-only nodes from IG policy calculations (#10985)
* Exclude control-only nodes from IG policy calculations

Also, as a reverse to this, exclude execution-only nodes from
  the calculations if the group in question is the controlplane

* Incorporate review comments
2021-09-01 08:09:46 -04:00
Alan Rominger
dc4b014d12 Make status command in error handling cleaner (#10823) 2021-08-31 12:02:39 -04:00
Jeff Bradberry
a2b984a1a5 Validate that control-only Instance nodes cannot change IG membership 2021-08-30 16:00:23 -04:00
Alan Rominger
68f79a1f3a Always use controlplane as project update backup IG (#10949)
* Always use controlplane as project update backup IG

Before, this was done conditionally to container_group jobs
this logic changes it so that controlgroup will always be a
firm backstop for project updates

* Code a little more defensively to make unit tests pass

* Fix unit tests
2021-08-30 14:23:09 -04:00
quasd
637d6173bc Check dynamic_input fields also with has_inputs() - Fixes,
using credential plugins in Container Registry credential,
with execution environments

Signed-off-by: quasd <qquasd@gmail.com>
2021-08-30 16:10:34 +03:00
Alan Rominger
424dbe8208 Use ansible-runner imports for cpu and memory calculation (#10954)
* Use ansible-runner imports for cpu and memory calculation

* Fix bug with capacity and memory adjustment
2021-08-27 21:46:53 -04:00
Jim Ladd
467a37f8fe use settings.DEFAULT_EXECUTION_QUEUE_NAME in lieu of default 2021-08-26 11:15:14 -07:00
Jim Ladd
88a6412b54 only need to update IG's policy_instance_list field 2021-08-26 11:15:14 -07:00
Jim Ladd
502eaf9fb9 handle case where default IG does not exist
* also, only add discovered execution node to default group
  if `register`-ing the node actually resulted in a confirmed
  change
2021-08-26 11:15:14 -07:00
Jim Ladd
de8eab0434 inspect_execution_nodes should *not* block when retreiving lock
* would otherwise hold up cluster heartbeat task
* furthermore, only really need one node to run through
  `inspect_execution_nodes` each interval
2021-08-26 11:15:14 -07:00
Jim Ladd
f317fca9e4 add auto-discovered nodes to default IG
* add advisory_lock to avoid IG update race logic
* update IG by way of policy_instance_list
2021-08-26 11:15:14 -07:00
Jim Ladd
561fc289fb disable discovered instances by default 2021-08-26 11:15:14 -07:00
Alan Rominger
daf4310176 Clean up work_type processing and fix execution vs control capacity (#10930)
* Clean up added work_type processing for mesh_code branch

* track both execution and control capacity

* Remove unused execution_capacity property

* Count all forms of capacity to make test pass

* Force jobs to be on execution nodes, updates on control nodes

* Introduce capacity_type property to abstract some details out

* Update test to cover all job types at same time

* Register OpenShift nodes as control types

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Update unit test to execution vs control TM logic changes

* Fix bug, else handling for work_type method
2021-08-26 07:24:14 -04:00
Alan Rominger
42484cf98d Obtain receptor sockfile from the receptor.conf file (#10932) 2021-08-24 11:20:21 -04:00
Shane McDonald
274e487a96 Attempt to surface streaming errors that were being eaten (#10918) 2021-08-24 10:33:00 -04:00
Alan Rominger
940c189c12 Corresponding AWX changes for runner --worker-info schema update (#10926) 2021-08-24 08:41:36 -04:00
Alan Rominger
c3ad479fc6 Minor tweaks for the mesh_code branch from review (#10902) 2021-08-24 08:41:35 -04:00
Alan Rominger
928c35ede5 Model changes for instance last_seen field to replace modified (#10870)
* Model changes for instance last_seen field to replace modified

* Break up refresh_capacity into smaller units

* Rename execution node methods, fix last_seen clustering

* Use update_fields to make it clear save only affects capacity

* Restructing to pass unit tests

* Fix bug where a PATCH did not update capacity value
2021-08-24 08:41:35 -04:00
beeankha
1a9fcdccc2 Change place where controller node is being looked for in the task manager 2021-08-24 08:41:35 -04:00
Alan Rominger
3b1e40d227 Use the ansible-runner worker --worker-info to perform execution node capacity checks (#10825)
* Introduce utilities for --worker-info health check integration

* Handle case where ansible-runner is not installed

* Add ttl parameter for health check

* Reformulate return data structure and add lots of error cases

* Move up the cleanup tasks, close sockets

* Integrate new --worker-info into the execution node capacity check

* Undo the raw value override from the PoC

* Additional refinement to execution node check frequency

* Put in more complete network diagram

* Followup on comment to remove modified from from health check responsibilities
2021-08-24 08:41:35 -04:00
Alan Rominger
4e84c7c4c4 Use the existing get_receptor_ctl method (#10813) 2021-08-24 08:41:35 -04:00
Alan Rominger
f47eb126e2 Adopt the node_type field in receptor logic (#10802)
* Adopt the node_type field in receptor logic

* Refactor Instance.objects.register so we do not reset capacity to 0
2021-08-24 08:41:34 -04:00
Alan Rominger
b53d3bc81d Undo some things not compatible with hybrid node hack (#10763) 2021-08-24 08:41:34 -04:00
Alan Rominger
9881bb72b8 Treat the awx_1 node as a hybrid node for now, use local work type (#10726) 2021-08-24 08:40:21 -04:00