Commit Graph

2253 Commits

Author SHA1 Message Date
Alan Rominger
7b35902d33 Respect settings to keep files and work units
Add new logic to cleanup orphaned work units
  from administrative tasks

Remove noisy log which is often irrelevant
  about running-cleanup-on-execution-nodes
  we already have other logs for this
2021-11-10 08:50:10 +08:00
Chris Meyers
6a2826b91c Merge pull request #11088 from saito-hideki/issue/10879
Fixed Org mapping behavior with SAML when Ansible Galaxy cred does not exist
2021-10-08 10:48:11 -04:00
Alan Rominger
b70793db5c Consolidate cleanup actions under new ansible-runner worker cleanup command (#11160)
* Primary development of integrating runner cleanup command

* Fixup image cleanup signals and their tests

* Use alphabetical sort to solve the cluster coordination problem

* Update test to new pattern

* Clarity edits to interface with ansible-runner cleanup method

* Another change corresponding to ansible-runner CLI updates

* Fix incomplete implementation of receptor remote cleanup

* Share receptor utils code between worker_info and cleanup

* Complete task logging from calling runner cleanup command

* Wrap up unit tests and some contract changes that fall out of those

* Fix bug in CLI construction

* Fix queryset filter bug
2021-10-05 16:32:03 -04:00
Alan Rominger
3fc63489f1 Filter controller_node selection to online nodes (#11120) 2021-09-24 23:01:32 -04:00
Rebeccah
55f2125a51 if the user provides a uuid and it exists, allow that to tie to the instance, which allows the user to update the instance based on the UUID (includeding updating the hostname) should they choose to do so. 2021-09-21 16:54:11 -04:00
Hideki Saito
9e74ac24fa Fixed Org mapping behavior with SAML when Ansible Galaxy cred does not exist
- Fixes #10879
- Fixes ansible/tower#5061

Signed-off-by: Hideki Saito <saito@fgrep.org>
2021-09-16 23:25:50 +09:00
Alan Rominger
46ac9506e6 Assure consistent ordering with default IG first (#11034)
* Assure consistent ordering with default IG first

* Write conditional a little more defensively to pass tests
2021-09-08 11:11:46 -04:00
Alan Rominger
6a17e5b65b Allow manually running a health check, and make other adjustments to the health check trigger (#11002)
* Full finalize the planned work for health checks of execution nodes

* Implementation of instance health_check endpoint

* Also do version conditional to node_type

* Do not use receptor mesh to check main cluster nodes health

* Fix bugs from testing health check of cluster nodes, add doc

* Add a few fields to health check serializer missed before

* Light refactoring of error field processing

* Fix errors clearing error, write more unit tests

* Update health check info in docs

* Bump migration of health check after rebase

* Mark string for translation

* Add related health_check link for system auditors too

* Handle health_check cluster node timeout, add errors for peer judgement
2021-09-03 16:37:37 -04:00
Jim Ladd
262cd3c695 set default uuid 2021-09-03 10:05:15 -07:00
Alan Rominger
68f79a1f3a Always use controlplane as project update backup IG (#10949)
* Always use controlplane as project update backup IG

Before, this was done conditionally to container_group jobs
this logic changes it so that controlgroup will always be a
firm backstop for project updates

* Code a little more defensively to make unit tests pass

* Fix unit tests
2021-08-30 14:23:09 -04:00
quasd
637d6173bc Check dynamic_input fields also with has_inputs() - Fixes,
using credential plugins in Container Registry credential,
with execution environments

Signed-off-by: quasd <qquasd@gmail.com>
2021-08-30 16:10:34 +03:00
Alan Rominger
424dbe8208 Use ansible-runner imports for cpu and memory calculation (#10954)
* Use ansible-runner imports for cpu and memory calculation

* Fix bug with capacity and memory adjustment
2021-08-27 21:46:53 -04:00
Alan Rominger
daf4310176 Clean up work_type processing and fix execution vs control capacity (#10930)
* Clean up added work_type processing for mesh_code branch

* track both execution and control capacity

* Remove unused execution_capacity property

* Count all forms of capacity to make test pass

* Force jobs to be on execution nodes, updates on control nodes

* Introduce capacity_type property to abstract some details out

* Update test to cover all job types at same time

* Register OpenShift nodes as control types

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Update unit test to execution vs control TM logic changes

* Fix bug, else handling for work_type method
2021-08-26 07:24:14 -04:00
Alan Rominger
940c189c12 Corresponding AWX changes for runner --worker-info schema update (#10926) 2021-08-24 08:41:36 -04:00
Alan Rominger
928c35ede5 Model changes for instance last_seen field to replace modified (#10870)
* Model changes for instance last_seen field to replace modified

* Break up refresh_capacity into smaller units

* Rename execution node methods, fix last_seen clustering

* Use update_fields to make it clear save only affects capacity

* Restructing to pass unit tests

* Fix bug where a PATCH did not update capacity value
2021-08-24 08:41:35 -04:00
Alan Rominger
3b1e40d227 Use the ansible-runner worker --worker-info to perform execution node capacity checks (#10825)
* Introduce utilities for --worker-info health check integration

* Handle case where ansible-runner is not installed

* Add ttl parameter for health check

* Reformulate return data structure and add lots of error cases

* Move up the cleanup tasks, close sockets

* Integrate new --worker-info into the execution node capacity check

* Undo the raw value override from the PoC

* Additional refinement to execution node check frequency

* Put in more complete network diagram

* Followup on comment to remove modified from from health check responsibilities
2021-08-24 08:41:35 -04:00
Alan Rominger
f47eb126e2 Adopt the node_type field in receptor logic (#10802)
* Adopt the node_type field in receptor logic

* Refactor Instance.objects.register so we do not reset capacity to 0
2021-08-24 08:41:34 -04:00
Alan Rominger
b53d3bc81d Undo some things not compatible with hybrid node hack (#10763) 2021-08-24 08:41:34 -04:00
Alan Rominger
9881bb72b8 Treat the awx_1 node as a hybrid node for now, use local work type (#10726) 2021-08-24 08:40:21 -04:00
Alan Rominger
f597205fa7 Run capacity checks with container isolation (#10688)
This requires swapping out the container images
  for the execution nodes from awx-ee to the awx image

For completeness, the hop node image is switched to the raw
  receptor image

A few outright bugs are fixed here
  memory calculation just was not right at all
  the execution_capacity calculation was reverse of intention

Drop in a few TODOs about error handling from debugging
2021-08-24 08:40:19 -04:00
Alan Rominger
13300bdbd4 Update rebase to keep old control plane capacity check
Also do some basic work to separate control versus execution capacity
  this is to assure that we don't send jobs to the control node
2021-08-24 08:40:19 -04:00
Alan Rominger
39e23db523 Make minor changes to add needed imports 2021-08-24 08:40:19 -04:00
Ryan Petrello
05cb876df5 implement an initial development environment for receptor-based clusters 2021-08-24 08:40:18 -04:00
Rebeccah
706f3f97ea add a new field to the instance model for use with receptor changes (incoming) 2021-07-26 15:53:56 -04:00
Alan Rominger
b329d9cbf4 Make quay.io the registry URL default 2021-06-29 13:26:51 -04:00
Shane McDonald
03fb12d4c2 Force system jobs to always run on control plane 2021-06-28 10:51:04 -04:00
Shane McDonald
0d7ef709bf Prevent jobs from trying to run on controlplane in k8s 2021-06-28 10:51:02 -04:00
Alan Rominger
a8083296e6 Make safe variable duplications for rename (#5093)
* Make safe variable duplications to controller

* Back out some changes to be smarter

* Update test to new vars
2021-06-22 10:49:38 -04:00
Alan Rominger
cc616206b3 Pass current apps to CredentialType setup method 2021-06-22 10:49:37 -04:00
Christian M. Adams
06b04007a0 Rename managed_by_tower to managed 2021-06-22 10:49:36 -04:00
Amol Gautam
b64c2d6861 Removed references to tower in InventorySource and Credentials
--- Removed reference to tower in  InventorySource and InventoryUpdate model
--- Added a migration for above change
--- Added new CONTROLLER* variables in awx/main/models/credentials/__init__.py
--- Migrated awxkit to new CONTROLLER* variables
--- Updated the tests to use new CONTROLLER* variables
--- Fix some issues with upgrade path, rename more cases
2021-06-22 10:49:35 -04:00
Alan Rominger
5a7a1b8f20 Remove inventory_as_dict which would mess with downstream templating (#5085) 2021-06-22 10:49:34 -04:00
Bill Nottingham
1e68519c99 Remove insights_credential from inventory 2021-06-22 10:49:33 -04:00
Seth Foster
75a27c38c2 Add a periodic task to reap unreleased receptor work units
- Add work_unit_id field to UnifiedJob
2021-06-22 10:49:31 -04:00
Shane McDonald
03e73156ea Allow inventory updates to run in container groups 2021-06-16 17:11:43 -04:00
Jeff Bradberry
85bb4e976f Execution environments are meaningless for workflows
so, remove them from the API endpoints for workflows.  Also, tear out
the WFJT.execution_environment step in the resolver.  If we want that
to be a thing, it ought to be a .default_environment instead.
2021-06-16 15:41:09 -04:00
softwarefactory-project-zuul[bot]
b10eb6f4c2 Merge pull request #10382 from AlanCoding/downstream_waterslide
Handle inventory types where Automation Hub collection names differ

Some collections will be moving, which is #10323
Yet, other collections will change name, which this is intended to address.

Reviewed-by: Alan Rominger <arominge@redhat.com>
Reviewed-by: Sarabraj Singh <singh.sarabraj@gmail.com>
Reviewed-by: Elijah DeLee <kdelee@redhat.com>
2021-06-16 18:15:08 +00:00
Alan Rominger
21aa1fc11f Handle inventory types where Automation Hub collection names differ
Move imports added by Bill to be in-line, because utils should not import models at top

Remove more get_licenser inline imports
2021-06-16 13:39:52 -04:00
Seth Foster
61846e88ca Add validator for ee image field name
awxkit default ee image name is now a fixed valid (but bogus) name, rather than random unicode
2021-06-15 20:06:49 -04:00
Bill Nottingham
f460f70513 Remove host insights view 2021-06-09 16:34:32 -04:00
Bill Nottingham
be18803250 Add support for Insights as an inventory source 2021-06-09 16:34:32 -04:00
Jeff Bradberry
9aa56b1247 Update the EE resolver logic
so that the control plane managed EE is kept separate.
2021-06-09 10:03:35 -04:00
softwarefactory-project-zuul[bot]
3340ef9c91 Merge pull request #10053 from AlanCoding/dropsies
Intentionally drop job event websocket messages in excess of 30 per second (configurable)

SUMMARY
The UI no longer follows the latest job events from websocket messages. Because of that, there's no reason to send messages for all events if the job event rate is high.
I used 30 because this is the number of events that I guesstimate will show in one page in the UI.
Needs the setting added in the UI.
This adds skip_websocket_message to event event_data. We could promote it to a top-level key for job events, if that is preferable aesthetically. Doing this allows us to test this feature without having to connect a websocket client. Ping @mabashian @chrismeyersfsu
ISSUE TYPE

Feature Pull Request

COMPONENT NAME

API
UI

ADDITIONAL INFORMATION
Scenario walkthrough:
a job is producing 1,000 events per second. User launches it, the screen fills up in, say 1/4 of a second. The scrollbar indicates content beyond the bottom of the screen. Now, for 3/4ths of a second, the scrollbar stays still. After that, it updates the scrollbar to the current line number that the job is on. The scrollbar continues to update the length of the output effectively once per second.

Reviewed-by: Alan Rominger <arominge@redhat.com>
Reviewed-by: Chris Meyers <None>
Reviewed-by: Jake McDermott <yo@jakemcdermott.me>
2021-06-08 20:10:45 +00:00
softwarefactory-project-zuul[bot]
a6383e7f79 Merge pull request #10324 from shanemcd/default_queue_name
Introduce distinct controlplane instance group

Reviewed-by: Alan Rominger <arominge@redhat.com>
Reviewed-by: Shane McDonald <me@shanemcd.com>
Reviewed-by: Matthew Jones <bsdmatburt@gmail.com>
Reviewed-by: Yanis Guenane <None>
Reviewed-by: Bianca Henderson <beeankha@gmail.com>
2021-06-08 19:18:35 +00:00
Alan Rominger
15effd7ade Add some conditions for always-send and never-send event types
Always send websocket messages for
  high priority events like playbook_on_stats

Never send websocket messages for
  events with no output
  unless they are a high priority event type
2021-06-08 13:33:53 -04:00
Alan Rominger
b551608f16 Move websocket skip logic into event_handler 2021-06-08 13:33:22 -04:00
Alan Rominger
b43d8e2c7f Enforce a 30-per-second max event websocket rate 2021-06-08 13:32:05 -04:00
softwarefactory-project-zuul[bot]
df8ce801cf Merge pull request #10325 from wenottingham/count-von-count
Add a field for hosts automated across to the subscription info

SUMMARY
This is populated by the new table we've added.
Update the subs check to check against this, not imported hosts.

ISSUE TYPE

Feature Pull Request
Bugfix Pull Request

COMPONENT NAME

API
UI

Reviewed-by: Bill Nottingham <None>
Reviewed-by: Amol Gautam <amol_gautam25@yahoo.co.in>
Reviewed-by: Alex Corey <Alex.swansboro@gmail.com>
Reviewed-by: Chris Meyers <None>
Reviewed-by: Tiago Góes <tiago.goes2009@gmail.com>
2021-06-08 15:30:43 +00:00
Alan Rominger
b26eaa3bd2 Remove uses of ansible_virtualenv_path 2021-06-07 21:14:35 -04:00
Bill Nottingham
ca07946c24 Adjust subscription calculation & warning behavior
- Add a field for hosts automated across

  This is populated by the new table we've added.
- Update the subs check to check against this, not imported hosts.
- Reword messages on inventory import
2021-06-07 15:47:55 -04:00