Commit Graph

108 Commits

Author SHA1 Message Date
Seth Foster
7a1ed406da Remove CRUD for Receptor Addresses
Removes ability to directly create and delete
receptor addresses for a given node.

Instead, receptor addresses are created automatically
if listener_port is set on the Instance.

For example patching "hop" instance

with {"listener_port": 6667}

will create a canonical receptor address with port
6667.

Likewise, peers_from_control_nodes on the instance
sets the peers_from_control_nodes on the canonical
address (if listener port is also set).

protocol is a read-only field that simply reflects
the canonical address protocol.

Other Changes:

- rename k8s_routable to is_internal
- add protocol to ReceptorAddress
- remove peers_from_control_nodes and listener_port
from Instance model

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Hao Liu
6db66c5f81 Fix provision instance not respecting protocol 2024-02-02 10:37:41 -05:00
Seth Foster
9ba70c151d Add canonical receptor address
Creates a non-deletable address that acts as
the "main" address for this instance.

All other addresses for that instance must
be non-canonical.

When listener_port on an instance is set, automatically
create a canonical receptor address where:
  - address is hostname of instance
  - port is listener_port
  - canonical is True

Additionally, protocol field is added to instance to
denote the receptor listener protocol to use (ws, tcp).

The receptor config listener information is derived from
the listener_port and protocol information. Having a
canonical address that mirrors the listener_port ensures that
an address exists that matches the receptor config information.

Other changes:
- Add managed field to receptor address.
If managed is True, no fields on on this address can be edited
via the API.
If canonical is True, only the address cannot be edited.

- Add managed field to instance. If managed is True, users
cannot set node_state to deprovisioning (i.e. cannot delete node)

This change to our mechanism to prevent users from deleting
the mesh ingress hop node.

- Field is_internal is now renamed to k8s_routable

- Add reverse_peers on instance which is a list of instance IDs
that peer to this instance (via an address)

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Seth Foster
3a17c45b64 Register_peers support for receptor_addresses
register_peers has inputs:

source: source instance
peers: list of instances the source should peer to

InstanceLink "target" is now expected to be a ReceptorAddress

For each peer, we can just use the first receptor address. If
multiple receptor addresses exist, throw a command error.

Currently this command is only used on VM-deployments, where
there is only a single receptor address per instance, so this
should work fine.

Other changes:
drop listener_port field from Instance. Listener port is now just
"port" on ReceptorAddress

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Seth Foster
bca68bcdf1 Add install bundle support
group_vars all.yaml changes:
- peer entry has two fields, address and port
- receptor_port is inferred from the first
receptor_address entry that uses protocol tcp

other changes:
ActivityStream now records when receptor_addresses
are peered to

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Seth Foster
c32f234ebb Add peers_from_control_nodes to ReceptorAddress
- write_receptor_config peers to ReceptorAddress entries
that have peers_from_control_nodes enabled

- peers_from_control_nodes and listener_port removed from Instance model

- peers_from_control_nodes added to ReceptorAddress model

- ReceptorAddress is now unique by address and protocol combination

- Write receptor config task is dispatched upon ReceptorAddress creation
or deletion, and when control node is first created

- InstanceLinkSerializer adds a target_address field and has logic
to grab the instance hostname associated with the peered ReceptorAddress

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Seth Foster
5cb3d3b078 Update receptor conf when address changes
Add post save and post delete hooks to
call write_receptor_config when
a receptor address is added / removed.

Add peers_from_control_nodes to
provision_instance

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Seth Foster
127a0cff23 Set ip_address to empty string
ip_address cannot be null, so set to
empty instead of None

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2023-10-05 22:53:16 -04:00
Seth Foster
441336301e Ensure ip_address is empty string 2023-08-29 13:06:54 -04:00
Seth Foster
81e06dace2 Add listener_port to provision_instance
API changes
- cannot change peers or enable
peers_from_control_nodes on VM deployments
- allow setting ip_address
- use ip_address over hostname in the generated
group_vars/all.yml
- Drop api/v2/peers endpoint

DB changes
- add ip_address unique constraint, but ignore "" entries

Other changes
- provision_instance should take listener_port option

Tests
- test that new controls doesn't disturb other peers
relationships
- test ip_address over hostname
2023-08-29 13:06:54 -04:00
Seth Foster
2a51f23b7d Add functional API tests
add tests for calling write_receptor_config

add write_receptor_config test

Do not set default listener_port on control node
2023-08-29 13:06:54 -04:00
Cesar Francisco San Nicolas Martinez
081206965c Generate random UUID by default for added remote nodes (#14074) 2023-06-06 12:36:28 +02:00
Martin Slemr
05f918e666 HostMetric compliance computation 2023-03-23 14:06:55 -04:00
Jeff Bradberry
8f6849fc22 Include listener_port in the defaults for Instance.objects.register (#13328) 2022-12-19 14:16:05 -03:00
Alan Rominger
a64467c5a6 Shortcut Instance.objects.me when possible 2022-10-05 09:11:42 -04:00
Jeff Bradberry
81e68cb9bf Update node and link registration to put them in the right state
'Installed' for the nodes, 'Established' for the links.
2022-09-23 09:46:11 -04:00
Alan Rominger
cb63d92bbf Remove committed_capacity field, delete supporting code (#12086)
* Remove committed_capacity field, delete supporting code

* Track consumed capacity to solve the negatives problem

* Use more verbose name for IG queryset
2022-04-22 13:41:32 -04:00
Alan Rominger
6c56f2b35b Delete dead code from get_or_register, move, and test 2022-03-30 13:35:42 -04:00
Jeff Bradberry
ac6a82eee4 Merge pull request #11654 from jbradberry/django-3.2-upgrade
Django 3.2 upgrade
2022-03-17 10:34:22 -04:00
Alex Corey
f52ef6e967 Fixes case sensitive host count 2022-03-09 15:36:05 -05:00
Jeff Bradberry
9b6fa55433 Deal with breaking tests for 3.1
- Django's PostgreSQL JSONField wraps values in a JsonAdapter, so deal
  with that when it happens.  This goes away in Django 3.1.
- Setting related *_id fields clears the actual relation field, so
  trying to fake objects for tests is a problem
- Instance.objects.me() was inappropriately creating stub objects
  every time while running tests, but some of our tests now create
  real db objects. Ditch that logic and use a proper fixture where needed.
- awxkit tox.ini was pinned at Python 3.8
2022-03-07 18:11:36 -05:00
Jeff Bradberry
b852baaa39 Fix up logger .warn() calls to use .warning() instead
This is a usage that was deprecated in Python 3.0.
2022-03-07 18:11:36 -05:00
Elijah DeLee
604cbc1737 Consume control capacity (#11665)
* Select control node before start task

Consume capacity on control nodes for controlling tasks and consider
remainging capacity on control nodes before selecting them.

This depends on the requirement that control and hybrid nodes should all
be in the instance group named 'controlplane'. Many tests do not satisfy that
requirement. I'll update the tests in another commit.

* update tests to use controlplane

We don't start any tasks if we don't have a controlplane instance group

Due to updates to fixtures, update tests to set node type and capacity
explicitly so they get expected result.

* Fixes for accounting of control capacity consumed

Update method is used to account for currently consumed capacity for
instance groups in the in-memory capacity tracking data structure we initialize in
after_lock_init and then update via calculate_capacity_consumed (both in
task_manager.py)

Also update fit_task_to_instance to consider control impact on instances

Trust that these functions do the right thing looking for a
node with capacity, and cut out redundant check for the whole group's
capacity per Alan's reccomendation.

* Refactor now redundant code

Deal with control type tasks before we loop over the preferred instance
groups, which cuts out the need for some redundant logic.

Also, fix a bug where I was missing assigning the execution node in one case!

* set job explanation on tasks that need capacity

move the job explanation for jobs that need capacity to a function
so we can re-use it in the three places we need it.

* project updates always run on the controlplane

Instance group ordering makes no sense on project updates because they
always need to run on the control plane.

Also, since hybrid nodes should always run the control processes for the
jobs running on them as execution nodes, account for this when looking for a
execution node.

* fix misleading message

the variables and wording were both misleading, fix to be more accurate
description in the two different cases where this log may be emitted.

* use settings correctly

use settings.DEFAULT_CONTROL_PLANE_QUEUE_NAME instead of a hardcoded
name
cache the controlplane_ig object during the after lock init to avoid
an uneccesary query
eliminate mistakenly duplicated AWX_CONTROL_PLANE_TASK_IMPACT and use
only AWX_CONTROL_NODE_TASK_IMPACT

* add test for control capacity consumption

add test to verify that when there are 2 jobs and only capacity for one
that one will move into waiting and the other stays in pending

* add test for hybrid node capacity consumption

assert that the hybrid node is used for both control and execution and
capacity is deducted correctly

* add test for task.capacity_type = control

Test that control type tasks have the right capacity consumed and
get assigned to the right instance group

Also fix lint in the tests

* jobs_running not accurate for control nodes

We can either NOT use "idle instances" for control nodes, or we need
to update the jobs_running property on the Instance model to count
jobs where the node is the controller_node.

I didn't do that because it may be an expensive query, and it would be
hard to make it match with jobs_running on the InstanceGroup which
filters on tasks assigned to the instance group.

This change chooses to stop considering "idle" control nodes an option,
since we can't acurrately identify them.

The way things are without any change, is we are continuing to over consume capacity on control nodes
because this method sees all control nodes as "idle" at the beginning
of the task manager run, and then only counts jobs started in that run
in the in-memory tracking. So jobs which last over a number of task
manager runs build up consuming capacity, which is accurately reported
via Instance.consumed_capacity

* Reduce default task impact for control nodes

This is something we can experiment with as far as what users
want at install time, but start with just 1 for now.

* update capacity docs

Describe usage of the new setting and the concept of control impact.

Co-authored-by: Alan Rominger <arominge@redhat.com>
Co-authored-by: Rebeccah <rhunter@redhat.com>
2022-02-14 10:13:22 -05:00
Jeff Bradberry
bb14a95076 Remove the Instance.objects.active_count() method
Literally nothing uses it.  The similar Host.objects.active_count()
method seems to be what is actually important for licensing.
2022-01-14 16:21:41 -05:00
Jeff Bradberry
f340f491dc Control the visibility and use of hop node Instances
- the list, detail, and health check API views should not include them
- the Instance-InstanceGroup association views should not allow them
  to be changed
- the ping view excludes them
- list_instances management command excludes them
- Instance.set_capacity_value sets hop nodes to 0 capacity
- TaskManager will exclude them from the nodes available for job execution
- TaskManager.reap_jobs_from_orphaned_instances will consider hop nodes
  to be an orphaned instance
- The apply_cluster_membership_policies task will not manipulate hop nodes
- get_broadcast_hosts will ignore hop nodes
- active_count also will ignore hop nodes
2021-12-17 14:30:28 -05:00
Elijah DeLee
e10030b73d Allow setting default execution group pod spec
This will allow us to control the default container group created via settings, meaning
we could set this in the operator and the default container group would get created with it applied.

We need this for https://github.com/ansible/awx-operator/issues/242

Deepmerge the default podspec and the override

With out this, providing the `spec` for the podspec would override everything
contained, which ends up including the container used, which is not desired

Also, use the same deepmerge function def, as the code seems to be copypasted from
the utils
2021-12-10 15:02:45 -05:00
Rebeccah
a9f4011a45 defensive code for getting instance added, also simplified nested if
statements, rewrote some comments add a logger warning that the instance is being grabbed by the hostname and not the UUID
2021-09-21 16:54:11 -04:00
Rebeccah
55f2125a51 if the user provides a uuid and it exists, allow that to tie to the instance, which allows the user to update the instance based on the UUID (includeding updating the hostname) should they choose to do so. 2021-09-21 16:54:11 -04:00
Jim Ladd
1b50db26b6 Explicitly pass in UUID to get_or_register
Co-authored-by: Alan Rominger <arominge@redhat.com>
2021-09-14 10:58:29 -07:00
Jim Ladd
f39834ad82 pass uuid to Instance.create 2021-09-03 10:05:15 -07:00
Jim Ladd
bdb13343bb remove unused import 2021-09-03 10:05:15 -07:00
Jim Ladd
262cd3c695 set default uuid 2021-09-03 10:05:15 -07:00
Jim Ladd
f02099e8b7 provision_instance should create new uuid if needed
.. instead of default to current system's UUID

related: #10990
2021-09-03 10:05:15 -07:00
Alan Rominger
daf4310176 Clean up work_type processing and fix execution vs control capacity (#10930)
* Clean up added work_type processing for mesh_code branch

* track both execution and control capacity

* Remove unused execution_capacity property

* Count all forms of capacity to make test pass

* Force jobs to be on execution nodes, updates on control nodes

* Introduce capacity_type property to abstract some details out

* Update test to cover all job types at same time

* Register OpenShift nodes as control types

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Remove unqualified consumed_capacity from task manager and make unit tests work

* Update unit test to execution vs control TM logic changes

* Fix bug, else handling for work_type method
2021-08-26 07:24:14 -04:00
Alan Rominger
928c35ede5 Model changes for instance last_seen field to replace modified (#10870)
* Model changes for instance last_seen field to replace modified

* Break up refresh_capacity into smaller units

* Rename execution node methods, fix last_seen clustering

* Use update_fields to make it clear save only affects capacity

* Restructing to pass unit tests

* Fix bug where a PATCH did not update capacity value
2021-08-24 08:41:35 -04:00
Alan Rominger
f47eb126e2 Adopt the node_type field in receptor logic (#10802)
* Adopt the node_type field in receptor logic

* Refactor Instance.objects.register so we do not reset capacity to 0
2021-08-24 08:41:34 -04:00
Rebeccah
fd6ce66906 edit the provision_instance awx-manage command to include node_type when provisioning a new instance. 2021-07-26 16:36:44 -04:00
Alan Rominger
8c1bc97c2f Fix up unit tests after tower to controller rename 2021-06-22 10:49:36 -04:00
Shane McDonald
ec8ac6f1a7 Introduce distinct controlplane instance group 2021-06-07 11:25:59 -04:00
Yanis Guenane
82c4f6bb88 Define a DEFAULT_QUEUE_NAME 2021-06-07 11:25:23 -04:00
Ryan Petrello
200901e53b upgrade to partitions without a costly bulk data migration
keep pre-upgrade events in an old table (instead of a partition)

- instead of creating a default partition, keep all events in special
"unpartitioned" tables
- track these tables via distinct proxy=true models
- when generating the queryset for a UnifiedJob's events, look at the
  creation date of the job; if it's before the date of the migration,
  query on the old unpartitioned table, otherwise use the more modern table
  that provides auto-partitioning
2021-06-04 09:17:08 -07:00
Jeff Bradberry
6a599695db Remove the IsolatedManager and its associated playbooks and plugins 2021-04-22 10:17:02 -04:00
Jeff Bradberry
b0cdfe7625 Clean up the management commands 2021-04-22 10:11:27 -04:00
Ryan Petrello
c2ef0a6500 move code linting to a stricter pep8-esque auto-formatting tool, black 2021-03-23 09:39:58 -04:00
Shane McDonald
1c4a376758 Explicit db field for is_container_group
We now have Container Groups that dont require a credential.
2021-03-15 13:28:39 -04:00
Shane McDonald
373bb443aa UnifiedJob#is_containerized -> UnifiedJob#is_container_group_task 2021-03-03 18:52:55 -05:00
Ryan Petrello
73baf3fcf9 defer loading Job.artifacts on host views to improve performance
see: https://github.com/ansible/awx/issues/8006

this data can be *really* large, and we don't actually need it for the summary fields on this API endpoint
2020-08-26 13:34:15 -04:00
Bill Nottingham
a33c303765 Remove active_counts_by_org
I was trying to parse the difference between this and the
(directly above) org_active_count from the comment, and then I
grepped and realized this function appears unused.
2020-06-10 15:25:14 -04:00
Shane McDonald
91dbc2de30 Add queue / instance group registration to heartbeat for k8s installs
There is some history here.

https://github.com/ansible/awx/pull/7190 <- This PR was an attempt at fixing a
bug notting ran into where some jobs on k8s installs would get stuck in Waiting
forever.

The PR mentioned above introduced a bug where there are no instance groups on a
fresh k8s-based install. This is because this process currently happens in the
launch scripts, before the database is up.

With this patch, queue / instance group registration happens in the heartbeat,
right after auto-registering the instance.
2020-06-06 08:58:35 -04:00
Seth Foster
6652464e25 Unset old instance IP when conflicting new instance IP
With AWX_AUTO_DEPROVISION_INSTANCES on, instances
are registered with an ip address. However, new
instances might try to register before old instances
are deprivisioned. In this case old IPs can conflict with
the new ones. This will check for an ip conflict and unset
the IP of conflicting instance (set to None)

ansible/awx issue 6750
2020-04-28 10:52:15 -04:00