Commit Graph

79 Commits

Author SHA1 Message Date
Alan Rominger
f80bbc57d8 AAP-43117 Additional dispatcher removal simplifications and waiting reaper updates (#16243)
* Additional dispatcher removal simplifications and waiting repear updates

* Fix double call and logging message

* Implement bugbot comment, should reap running on lost instances

* Add test case for new pending behavior
2026-01-26 13:55:37 -05:00
Jake Jackson
36a00ec46b AAP-58539 Move to dispatcherd (#16209)
* WIP First pass
* started removing feature flags and adjusting logic
* Add decorator
* moved to dispatcher decorator
* updated as many as I could find
* Keep callback receiver working
* remove any code that is not used by the call back receiver
* add back auto_max_workers
* added back get_auto_max_workers into common utils
* Remove control and hazmat (squash this not done)
* moved status out and deleted control as no longer needed
* removed unused imports
* adjusted test import to pull correct method
* fixed imports and addressed clusternode heartbeat test
* Update function comments
* Add back hazmat for config and remove baseworker
* added back hazmat per @alancoding feedback around config
* removed baseworker completely and refactored it into the callback
  worker
* Fix dispatcher run call and remove dispatch setting
* remove dispatcher mock publish setting
* Adjust heartbeat arg and more formatting
* fixed the call to cluster_node_heartbeat missing binder
* Fix attribute error in server logs
2026-01-23 20:49:32 +00:00
Alan Rominger
dce5ac73c5 Apply new rules from black update (#16232) 2026-01-19 12:58:07 -05:00
jessicamack
de86b93690 AAP-59874: Update to Python 3.12 (#16208)
* update to Python 3.12

* remove use of utcnow

* switch to timezone.utc

datetime.UTC is an alias of datetime.timezone.utc. if we're doing the double import for datetime it's more straightforward to just import timezone as well and get it directly

* debug python env version issue

* change python version

* pin to SHA and remove debug portion
2026-01-07 11:57:24 -05:00
Alan Rominger
48c7534b57 AAP-60452 Remove the dynamic log level filter for the dispatcherd main process (#16200)
* Remove the dynamic filter on dispatcher startup

Configure the dynamic logging level only on startup

* Special case for log level on settings change

* Add unit test for new behavior

* Add test for initial config

* Mark test django DB

* Do necessary requirement bump

* Delete cache in live test fixture
2026-01-02 15:45:06 -05:00
Seth Foster
2fa2cd8beb Add timeout and on duplicate to system tasks (#16169)
Modify the invocation of @task_awx to accept timeout and
on_duplicate keyword arguments. These arguments are
only used in the new dispatcher implementation.

Add decorator params:
- timeout
- on_duplicate

to tasks to ensure better recovery for
stuck or long-running processes.

---------

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2025-11-12 23:18:57 -05:00
AlanCoding
55a7591f89 Resolve actions conflicts and delete unwatned files
Bump migrations and delete some files

Resolve remaining conflicts

Fix requirements

Flake8 fixes

Prefer devel changes for schema

Use correct versions

Remove sso connected stuff

Update to modern actions and collection fixes

Remove unwated alias

Version problems in actions

Fix more versioning problems

Update warning string

Messed it up again

Shorten exception

More removals

Remove pbr license

Remove tests deleted in devel

Remove unexpected files

Remove some content missed in the rebase

Use sleep_task from devel

Restore devel live conftest file

Add in settings that got missed

Prefer devel version of collection test

Finish repairing .github path

Remove unintended test file duplication

Undo more unintended file additions
2025-09-17 10:23:19 -04:00
AlanCoding
8fb6a3a633 Merge remote-tracking branch 'tower/test_stable-2.6' into merge_26_2 2025-09-04 23:06:53 -04:00
Alan Rominger
d7ca19f9f0 Clean up logging for receptor down & unregistered cases (#15990)
* Clean up logging when receptor not running

* Make logging more concise for unregistered case

* Silence another unwanted traceback

* Silence a few more tracebacks
2025-06-02 13:51:44 -04:00
Alan Rominger
94764a1f17 AAP-42649 Flag-gated use of "dispatcherd" as its own library (#15981)
Use dynamic AWX max_workers value

Make basic --status and --running commands work

Make feature flag enabled true by default for development

* [dispatcherd] Dispatcher socket-based `--status` demo working (#15908)

* Fix Task Decorator to Work With and Without Feature Flag (AAP-41775) (#15911)

* refactor(system): extract common heartbeat helpers and split cluster_node_heartbeat

Extract common heartbeat logic into helper functions:  _heartbeat_instance_management: consolidates instance management, health checks, and lost-instance detection.  _heartbeat_check_versions: compares instance versions and initiates shutdown when necessary.  _heartbeat_handle_lost_instances: reaps jobs and marks lost instances offline.

Refactor the original cluster_node_heartbeat to use these helpers and retain legacy behavior (using bind_kwargs).

Introduce adispatch_cluster_node_heartbeat for dispatcherd: uses the control API to retrieve running tasks and reaps them.

Link the two implementations by attaching adispatch_cluster_node_heartbeat as the _new_method on cluster_node_heartbeat.

* feat(publish): delegate heartbeat task submission to new dispatcherd implementation

Update apply_async to check at runtime if FEATURE_NEW_DISPATCHER is enabled.

When the task is cluster_node_heartbeat and a _new_method is attached, delegate the task submission to the new dispatcherd implementation.

Preserve the original behavior for all other tasks and fallback on error.

* refactor(system): extract task ID retrieval from dispatcherd into helper function

Improves readability of adispatch_cluster_node_heartbeat by extracting
the complex UUID parsing logic into a dedicated helper function.
Adds clearer error handling and follows established code patterns.

* fix(dispatcher): Enable task decorator to work with and without feature flag

Implemented a new approach for handling task execution with feature flags
by attaching alternative implementations to apply_async._new_method. This
allows cluster_node_heartbeat to work correctly with both the legacy and
new dispatcher systems without modifying core decorator logic.

AAP-41775

* fix(dispatcher): Improve error handling and logging in feature flag implementation

- Add error handling when attaching alternative dispatcher implementation
- Fix method self-reference in apply_async to properly use cls.apply_async
- Document limitations of this targeted approach for specific tasks
- Add logging for better debugging of dispatcher selection
- Ensure decorator timing by keeping method attachment after function definitions

This completes the robust implementation for switching between dispatcher
implementations based on feature flags.

AAP-41775

* fix(dispatcher): Implement registry pattern for dispatcher feature flag compatibility

Replaces direct method attribute assignment with a global registry for
alternative implementations. The original approach tried to attach new
methods directly to apply_async bound methods, which fails because bound
methods don't support attribute assignment in Python.

The registry pattern:
- Creates a global ALTERNATIVE_TASK_IMPLEMENTATIONS dict in publish.py
- Registers alternative implementations by task name
- Modifies apply_async to check the registry when feature flag is enabled
- Adds extensive logging throughout the process for debugging

This enables cluster_node_heartbeat to work correctly with both the legacy
and new dispatcher implementations based on the FEATURE_NEW_DISPATCHER flag.

AAP-41775

* refactor(dispatcher): Remove excessive logging from dispatcher implementation

Reduces verbose debugging logs while maintaining essential logging for critical
operations. Preserves:
- Task implementation selection based on feature flag
- Registration success/failure messages
- Critical error reporting

Removed:
- Registry content debugging messages
- Repetitive task diagnostics
- Non-essential information logging

AAP-41775

* fix(dispatcher): Fix shallow copy in dispatcher schedule conversion

This resolves "AttributeError: 'float' object has no attribute 'total_seconds'"
errors when the dispatcher is restarted.

Refs: AAP-41775

* Use IPC mechanism to get running tasks (#15926)
* Allow tasks from tasks
* Fix failure to limit to waiting jobs
* Get job record with lock
* Fix failures in dispatcherd feature branch (#15930)
* Fully handle DispatcherCancel
* Complete rest of preload import work
* Complete dispatcherd integration & job cancellation (AAP-43033) (#15941)
* feat(dispatcher): Implement job cancellation for new dispatcher

Adds feature-flag-aware job cancellation that routes cancel requests to either
the legacy dispatcher or the new dispatcherd library based on the
FEATURE_NEW_DISPATCHER flag.

- Updates cancel_dispatcher_process() to use dispatcherd's control API when enabled
- Handles both direct cancellation and task manager workflow cancellation cases
- Works with DispatcherCancel exception handling to properly handle SIGUSR1 signals

AAP-43033

* fix(dispatcher): Update run_dispatcher.py to properly handle task cancellation

Modifies the cancel command in run_dispatcher.py to properly cancel tasks
when the FEATURE_NEW_DISPATCHER flag is enabled, rather than just listing
running tasks.

The implementation translates each task UUID to the appropriate
filter format expected by the dispatcherd control API, maintaining the same
behavior as the original implementation.

Part of: AAP-43033

* refactor(system): Refactor dispatch_startup() to extract common startup logic and branch based on feature flag

This commit refactors the dispatch_startup() function to improve clarity and consistency across the legacy
and new dispatcherd flows.

No dispatcher-specific functionality is needed beyond the changes made, so this refactoring improves robustness without
altering core behavior.

* refactor(system): Refactor inform_cluster_of_shutdown() for clarity

* refactor(tasks): Replace @task with @task_awx across 22 tasks for dispatcher compatibility

- Migrated all task decorators to use @task_awx, ensuring dispatcher-aware behavior.
- Tested each task with the new dispatcherd, verifying that tasks using the registry pattern execute correctly without needing binder‐based alternative implementations.
- Removed redundant logging and outdated comments.
- Legacy tasks that do not require special parameter extraction continue to use their original logic.
- This commit reflects our complete journey of testing and verifying dispatcherd compatibility across all 22 tasks.

* refactor(publish): fix linter

* Fix bug from the branch rebase

* AAP-43763 Add tests for connection management in dispatcherd workers (#15949)

* Add test for job cancel in live tests
* Fix bug from the branch rebase
* Add test for connection recovery after connection broke
* Add test for breaking connection

* Fix dispatcherd bugs: schedule aliases, job kwargs handling, cancel handling (#15960)

* Put in job kwargs handling, not done before

* AAP-44382 [dispatcherd] Fixes for running with feature flag off (#15973)

* Use correct decorator for test of tasks

* Finalize dispatcherd feature branch (#15975)

* Work dispatcherd into dependency management system

* Use util methods from DAB

* Rename the dispatcherd feature flag, and flip default to not-enabled

* Move to new submit_task method

* Update the location of the sock file

* AAP-44381 Make dispatcherd config loading more lazy (#15979)

* Make dispatcherd config loading more lazy

* Make submission error more obvious

* Fix signal handling gap, hijack SIGUSR1 from dispatcherd (#15983)

* Fix signal handling gap, hijack SIGUSR1 from dispatcherd

* Minor adjustments to dispatcherd status command

* [dispatcherd] Get rid of alternative task registry (#15984)

Get rid of alternative task registry

* Fix deadlock error and other cleanup errors (#15987)
* Move to proper error handling location

---------

Co-authored-by: artem_tiupin <70763601+art-tapin@users.noreply.github.com>
2025-05-16 09:39:22 -04:00
Alan Rominger
529ee73fcd [4.6] Backport the "live" tests (#6859)
* Create a new pytest folder for live system testing with normal services (#15688)

* PoC for running dev env tests

* Replace in github actions

* Move folder to better location

* Further streamlining of new test folders

* Consolidate fixture, add writeup docs

* Use star import

* Push the wait-for-job to the conftest

Fix misused project cache identifier (#15690)

Fix project cache identifiers for new updates

Finish test and discover viable solution

Add comment on related task code

AAP-37989 Tests for exclude list with multiple jobs (#15722)

* Tests for exclude list with multiple jobs

Create test for using manual & file projects (#15754)

* Create test for using a manual project

* Chang default project factory to git, remove project files monkeypatch

* skip update of factory project

* Initial file scaffolding for feature

* Fill in galaxy and names

* Add README, describe project folders and dependencies

Add ee cleanup tests

* Adds cleanup tests to the live test.

Fix rsyslog permission error in github ubuntu tests from apparmor (#15717)

* Add test to detect rsyslog config problems

* Get dmesg output

* Disable apparmor for rsyslogd

Make awx/main/tests/live dramatically faster (#15780)

* Make awx/main/tests/live dramatically faster

* Add new setting to exclude list

* Fix rebase issues

* Did not want to backport this
2025-02-25 15:22:38 -05:00
Alan Rominger
b502a9444a [4.6 backport] Feature indirect host counting (#15802) (#6858)
* Feature indirect host counting (#15802)

* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)

* Add jq dependency

* Add file in progress

* Add license for jq

* Write test and get it passing

* Successfully test collection of `event_query.yml` data (#15761)

* Callback plugin method from cmeyers adapted to global collection list

Get tests passing

Mild rebranding

Put behind feature flag, flip true in dev

Add noqa flag

* Add missing wait_for_events

* feat: try grabbing query files from artifacts directory (#15776)

* Contract changes for the event_query collection callback plugin (#15785)

* Minor import changes to collection processing in callback plugin

* Move agreed location of event_query file

* feat: remaining schema changes for indirect host audits (#15787)

* Re-organize test file and move artifacts processing logic to callback (#15784)

* Rename the indirect host counting test file

* Combine artifacts saving logic

* Connect host audit model to jq logic via new task

* Add unit tests for indirect host counting (#15792)

* Do not get django flags from database (#15794)

* Document, implement, and test remaining indirect host audit fields (#15796)

* Document, implement, and test remaining indirect host audit fields

* Fix hashing

* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)

* Wait for all event processing to finish, add fallback task

* Add flag check to periodic task

* feat: cleanup of old indirect host audit records (#15800)

* By default, do not count indirect hosts (#15801)

* By default, do not count indirect hosts

* Fix copy paste goof

* Fix linter issue from base branch

* prevent multiple tasks from processing the same job events, prevent p… (#15805)

prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job

* Fix typos and other bugs found by Pablo review

* fix: rely on resolved_action instead of task, adapt to proposed query… (#15815)

* fix: rely on resolved_action instead of task, adapt to proposed query structure

* tests: update indirect host tests

* update remaining queries to new format

* update live test

* Remove polling loop for job finishing event processing (#15811)

* Remove polling loop for job finishing event processing

* Make awx/main/tests/live dramatically faster (#15780)

* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)

* Add jq dependency

* Add file in progress

* Add license for jq

* Write test and get it passing

* Successfully test collection of `event_query.yml` data (#15761)

* Callback plugin method from cmeyers adapted to global collection list

Get tests passing

Mild rebranding

Put behind feature flag, flip true in dev

Add noqa flag

* Add missing wait_for_events

* feat: try grabbing query files from artifacts directory (#15776)

* Contract changes for the event_query collection callback plugin (#15785)

* Minor import changes to collection processing in callback plugin

* Move agreed location of event_query file

* feat: remaining schema changes for indirect host audits (#15787)

* Re-organize test file and move artifacts processing logic to callback (#15784)

* Rename the indirect host counting test file

* Combine artifacts saving logic

* Connect host audit model to jq logic via new task

* Document, implement, and test remaining indirect host audit fields (#15796)

* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)

* Wait for all event processing to finish, add fallback task

* Add flag check to periodic task

* feat: cleanup of old indirect host audit records (#15800)

* prevent multiple tasks from processing the same job events, prevent p… (#15805)

prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job

* Remove polling loop for job finishing event processing (#15811)

* Make awx/main/tests/live dramatically faster (#15780)

* reorder migrations to allow indirect instances backport

* cleanup for rebase and merge into devel

---------

Co-authored-by: Peter Braun <pbraun@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
Co-authored-by: Peter Braun <pbranu@redhat.com>
2025-02-24 21:55:44 +00:00
Alan Rominger
7d30dff075 Feature indirect host counting (#15802)
* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)

* Add jq dependency

* Add file in progress

* Add license for jq

* Write test and get it passing

* Successfully test collection of `event_query.yml` data (#15761)

* Callback plugin method from cmeyers adapted to global collection list

Get tests passing

Mild rebranding

Put behind feature flag, flip true in dev

Add noqa flag

* Add missing wait_for_events

* feat: try grabbing query files from artifacts directory (#15776)

* Contract changes for the event_query collection callback plugin (#15785)

* Minor import changes to collection processing in callback plugin

* Move agreed location of event_query file

* feat: remaining schema changes for indirect host audits (#15787)

* Re-organize test file and move artifacts processing logic to callback (#15784)

* Rename the indirect host counting test file

* Combine artifacts saving logic

* Connect host audit model to jq logic via new task

* Add unit tests for indirect host counting (#15792)

* Do not get django flags from database (#15794)

* Document, implement, and test remaining indirect host audit fields (#15796)

* Document, implement, and test remaining indirect host audit fields

* Fix hashing

* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)

* Wait for all event processing to finish, add fallback task

* Add flag check to periodic task

* feat: cleanup of old indirect host audit records (#15800)

* By default, do not count indirect hosts (#15801)

* By default, do not count indirect hosts

* Fix copy paste goof

* Fix linter issue from base branch

* prevent multiple tasks from processing the same job events, prevent p… (#15805)

prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job

* Fix typos and other bugs found by Pablo review

* fix: rely on resolved_action instead of task, adapt to proposed query… (#15815)

* fix: rely on resolved_action instead of task, adapt to proposed query structure

* tests: update indirect host tests

* update remaining queries to new format

* update live test

* Remove polling loop for job finishing event processing (#15811)

* Remove polling loop for job finishing event processing

* Make awx/main/tests/live dramatically faster (#15780)

* AAP-37282 Add parse JQ data and test it for a `job` object in isolation (#15774)

* Add jq dependency

* Add file in progress

* Add license for jq

* Write test and get it passing

* Successfully test collection of `event_query.yml` data (#15761)

* Callback plugin method from cmeyers adapted to global collection list

Get tests passing

Mild rebranding

Put behind feature flag, flip true in dev

Add noqa flag

* Add missing wait_for_events

* feat: try grabbing query files from artifacts directory (#15776)

* Contract changes for the event_query collection callback plugin (#15785)

* Minor import changes to collection processing in callback plugin

* Move agreed location of event_query file

* feat: remaining schema changes for indirect host audits (#15787)

* Re-organize test file and move artifacts processing logic to callback (#15784)

* Rename the indirect host counting test file

* Combine artifacts saving logic

* Connect host audit model to jq logic via new task

* Document, implement, and test remaining indirect host audit fields (#15796)

* Document, implement, and test remaining indirect host audit fields

* Fix hashing

* AAP-39559 Wait for all event processing to finish, add fallback task (#15798)

* Wait for all event processing to finish, add fallback task

* Add flag check to periodic task

* feat: cleanup of old indirect host audit records (#15800)

* prevent multiple tasks from processing the same job events, prevent p… (#15805)

prevent multiple tasks from processing the same job events, prevent periodic task from spawning another task per job

* Remove polling loop for job finishing event processing (#15811)

* Remove polling loop for job finishing event processing

* Make awx/main/tests/live dramatically faster (#15780)

* temp

* remove test

* reorder migrations to allow indirect instances backport

* cleanup for rebase and merge into devel

---------

Co-authored-by: Peter Braun <pbraun@redhat.com>
Co-authored-by: jessicamack <jmack@redhat.com>
Co-authored-by: Peter Braun <pbranu@redhat.com>
2025-02-24 16:39:51 +00:00
Chris Meyers
ad706d67c2 Add ee cleanup tests
* Adds cleanup tests to the live test.
2025-01-24 10:55:30 -05:00
Alan Rominger
f57a9863d6 Use advisory_lock from DAB (#15676)
* Use advisory_lock from DAB

* Remove the django-pglocks dep

* Re-run updater script

* Move the import in new location
2025-01-15 14:06:59 -05:00
Hao Liu
139d8f0ae2 Add RECEPTOR_KEEP_WORK_ON_ERROR setting
If RECEPTOR_KEEP_WORK_ON_ERROR is set to true receptor work unit will not be automatically released

Co-Authored-By: Chris Meyers <chrismeyersfsu@users.noreply.github.com>
2024-07-22 17:02:37 -04:00
Alan Rominger
fde8af9f11 Fix task ending in error due to bad iterator (#15355) 2024-07-12 13:20:39 -04:00
Alan Rominger
b727d2c3b3 Log conflicts and created items by the periodic resource sync (#15337)
* Initial lazy logging of periodic sync results

* Add desired polish to log

* Add debug log
2024-07-09 15:13:08 -04:00
Hao Liu
6f2307f50e Add TASK_MANAGER_LOCK_TIMEOUT (#15300)
* Add TASK_MANAGER_LOCK_TIMEOUT

`TASK_MANAGER_LOCK_TIMEOUT` controls the `idle_in_transaction_session_timeout` and `idle_session_timeout` configuration for task manager connections and lock in database

hope to prevent the situation that the task instance that holds the lock becomes unresponsive and preventing other instance to be able to run task manager

* Add session timeout to periodic scheduler and all sub task manager locks
2024-06-27 09:42:41 -04:00
Hao Liu
c1dc0c7b86 Periodically sync from share resource provider (#15264)
* Periodically sync from share resource provider

- add periodic task `periodic_resource_sync` run once every 15 min
- if `RESOURCE_SERVER` is not configured sync will not run
- only 1 node

example RESOURCE_SERVER configuration
```
RESOURCE_SERVER = {
    "URL": "<resource server url>",
    "SECRET_KEY": "<resource server auth token>",
    "VALIDATE_HTTPS": <True/False>,
}
RESOURCE_SERVICE_PATH = <resource_service_path>
```
2024-06-10 18:10:57 +00:00
Hao Liu
dd9160135d Prune dangle image periodically (#14957)
Prune dangle image periodically

pairs with https://github.com/ansible/ansible-runner/pull/1342

this fix the problem of us forcefully remove images when setting changing ee image that's being used in a job causing the job to fail
2024-03-12 10:57:57 -04:00
David O Neill
ca8085fe7e English string validation to error code validation 2024-03-11 20:07:16 +00:00
Chris Meyers
8a902debd5 Per-service metrics http server
* Organize metrics into their respective service
* Server per-service metrics on a per-service http server
* Increase prometheus client usage over our custom metrics fields
2024-02-05 15:17:24 -05:00
Seth Foster
c32f234ebb Add peers_from_control_nodes to ReceptorAddress
- write_receptor_config peers to ReceptorAddress entries
that have peers_from_control_nodes enabled

- peers_from_control_nodes and listener_port removed from Instance model

- peers_from_control_nodes added to ReceptorAddress model

- ReceptorAddress is now unique by address and protocol combination

- Write receptor config task is dispatched upon ReceptorAddress creation
or deletion, and when control node is first created

- InstanceLinkSerializer adds a target_address field and has logic
to grab the instance hostname associated with the peered ReceptorAddress

Signed-off-by: Seth Foster <fosterbseth@gmail.com>
2024-02-02 10:37:41 -05:00
Alan Rominger
333ef76cbd Send notifications for dependency failures (#14603)
* Send notifications for dependency failures

* Delete tests for deleted method

* Remove another test for removed method
2023-10-30 10:42:37 -04:00
Lila Yasin
6ce5799689 Incorrect capacity for remote execution nodes 14051 (#14315) 2023-09-05 11:20:36 -04:00
Alan Rominger
ab3ceaecad Remove extra scheduler state save that does nothing (#14396) 2023-08-31 10:35:07 -04:00
Martin Slemr
660dab439b HostMetrics: Hard auto-cleanup (#14255)
Fix host metric settings

Cleanup_host_metric command with default params

Fix order of host metric cleanups
2023-08-30 09:18:59 -04:00
Seth Foster
3e8202590c Remove Disconnected link state
Dynamically flipping from Established
to Disconnected is not the intended
usage of InstanceLink State.

- Link state starts in Adding and becomes
Established once any control node first sees the link
is in the status KnownConnectionCosts
2023-08-29 13:06:54 -04:00
Seth Foster
2bf6512a8e Do not change link state if Removing
inspect_established_receptor_connections should
not change link state is current state is Removing.

Other changes:
- rename inspect_execution_nodes to inspect_execution_and_hop_nodes
- Default link state is Adding
- Set min listener_port value to 1024
- inspect_established_receptor_connections now
runs as part of cluster_node_heartbeat task
2023-08-29 13:06:54 -04:00
Lorenzo Tanganelli
f7fdb7fe8d Add peers readonly api and instancelink constraint (#13916)
Add Disconnected link state

introspect_receptor_connections is a periodic
task that examines active receptor connections
and cross-checks it with the InstanceLink info.

Any links that should be active but are not
will be put into a Disconnected state. If
active, it will be in an Established state.

UI - Add hop creation and peers mgmt (#13922)

* add UI for mgmt peers, instance edit and add

* add peer info on detail and bug fix on detail

* remove unused chip and change peer label

* rename lookup, put Instance type disable on edit

---------

Co-authored-by: tanganellilore <lorenzo.tanagnelli@hotmail.it>
2023-08-29 13:06:54 -04:00
Jeff Bradberry
14992cee17 Add in an async task to migrate the data over 2023-08-10 13:48:58 -04:00
Gabriel Muniz
e7c7454a3a Remove host update code which can be non performant (#14233) 2023-07-24 09:56:40 -04:00
Martin Slemr
6c5590e0e6 HostMetricSummaryMonthly command + views + scheduled task (#13999)
Co-authored-by: Alan Rominger <arominge@redhat.com>
2023-07-12 16:40:09 -04:00
Cesar Francisco San Nicolas Martinez
081206965c Generate random UUID by default for added remote nodes (#14074) 2023-06-06 12:36:28 +02:00
Hao Liu
9870187af5 Fix copy API
In web/task split deployment web and task container no longer share the same redis cache

In the original code we use redis cache to pass the list of sub objects that need to be copied to the new object

In this PR we extracted out the logic that computes the sub_object_list and move it into deep_copy_model_obj task
2023-04-18 16:03:04 -04:00
Martin Slemr
6d4f92e1e8 HostMetric Cleanup task 2023-04-07 08:56:03 -04:00
jessicamack
57d009199d removed unused imports. fix exception message
Signed-off-by: jessicamack <jmack@redhat.com>
2023-03-29 22:09:19 -04:00
Hao Liu
cd3f7666be add get_task_queuename
get_local_queuename will return the pod name of the instance

now that web and task are in different pods when web container queue a task it will be put into a queue without as task worker to execute the task
2023-03-29 22:09:19 -04:00
Jessica Mack
78652bdd71 add functionality back to cache clear method
Signed-off-by: Jessica Mack <jmack@redhat.com>
2023-03-29 22:09:18 -04:00
Jessica Mack
cb31973d59 switched to using the built in task processing
Signed-off-by: Jessica Mack <jmack@redhat.com>
2023-03-29 22:04:43 -04:00
Jessica Mack
d8e591cd69 added cache-clear service. update dispatcher queues
Signed-off-by: Jessica Mack <jmack@redhat.com>
2023-03-29 22:04:43 -04:00
jessicamack
b5e04a4cb3 AWX code changes for rsyslog decoupling (#13222)
* add management command and logging for new daemon
* switch tasks over to calling pg_notify
* add daemon to docker-compose and supervisor
* renamed handle_setting_changes and moved notify call
* removed initial rsyslog configure from dispatcher
* add logging and clear cache before reconfigure
* add notify to delete
* moved pg_notify to own function
* update tests impacted by rsyslog change
* changed over to new pg_notify method

Signed-off-by: Jessica Mack <jmack@redhat.com>
2023-03-29 22:04:43 -04:00
Gabriel Muniz
e15f4de0dd Fix race with heartbeat and reaper logic (#13713)
* Fix race with heartbeat and reaper logic

* Fix tests to fail when over drift over heartbeat time

* replaced modified with started time for reap() code and added test

* fixed logic bug and cleaned up tests

* Added comments to tests to call out reasoning
2023-03-17 14:24:31 -04:00
Alan Rominger
94b34b801c Avoid unbounded kwargs by fetching subtasks inside handle_work_error
Update tests to new handle_work_error call pattern

Handle blame correctly with multiple serial deps
  add new test case corresponding to this scenario
2022-12-19 16:02:51 -05:00
Jeff Bradberry
65179d9cd0 Add a new Instance.health_check_started field
This will enable us to provide more useful information for the user,
now that all user-triggered health checks are async.

Also, de-bounce the health check endpoint to not allow additional
health check tasks to be triggered when one is already in progress.
2022-09-27 17:09:41 -04:00
Jeff Bradberry
0465a10df5 Deal with exceptions when running execution_node_health_check (#12733) 2022-09-23 09:46:12 -04:00
Shane McDonald
9b034ad574 generate control node receptor.conf
when a new remote execution/hop node is added
regenerate the receptor.conf for all control node to
peer out to the new remote execution node

Signed-off-by: Hao Liu <haoli@redhat.com>
Co-Authored-By: Seth Foster <fosterseth@users.noreply.github.com>
Co-Authored-By: Shane McDonald <me@shanemcd.com>
2022-09-23 09:46:12 -04:00
Jeff Bradberry
604fac2295 Update task management to only do things with ready instances 2022-09-23 09:46:11 -04:00
Jeff Bradberry
24bfacb654 Check state when processing receptorctl advertisements
Nodes that show up and were in one of the unready states need to be
transitioned to ready, even if the logic in Instance.is_lost was not
met.
2022-09-23 09:46:11 -04:00