Allow manually running a health check, and make other adjustments to the health check trigger (#11002)

* Full finalize the planned work for health checks of execution nodes

* Implementation of instance health_check endpoint

* Also do version conditional to node_type

* Do not use receptor mesh to check main cluster nodes health

* Fix bugs from testing health check of cluster nodes, add doc

* Add a few fields to health check serializer missed before

* Light refactoring of error field processing

* Fix errors clearing error, write more unit tests

* Update health check info in docs

* Bump migration of health check after rebase

* Mark string for translation

* Add related health_check link for system auditors too

* Handle health_check cluster node timeout, add errors for peer judgement
This commit is contained in:
Alan Rominger
2021-09-03 16:37:37 -04:00
committed by GitHub
parent 169c0f6642
commit 6a17e5b65b
15 changed files with 285 additions and 53 deletions

View File

@@ -0,0 +1,33 @@
{% ifmeth GET %}
# Health Check Data
Health checks are used to obtain important data about an instance.
Instance fields affected by the health check are shown in this view.
Fundamentally, health checks require running code on the machine in question.
- For instances with `node_type` of "control" or "hybrid", health checks are
performed as part of a periodic task that runs in the background.
- For instances with `node_type` of "execution", health checks are done by submitting
a work unit through the receptor mesh.
If ran through the receptor mesh, the invoked command is:
```
ansible-runner worker --worker-info
```
For execution nodes, these checks are _not_ performed on a regular basis.
Health checks against functional nodes will be ran when the node is first discovered.
Health checks against nodes with errors will be repeated at a reduced frequency.
{% endifmeth %}
{% ifmeth POST %}
# Manually Initiate a Health Check
For purposes of error remediation or debugging, a health check can be
manually initiated by making a POST request to this endpoint.
This will submit the work unit to the target node through the receptor mesh and wait for it to finish.
The model will be updated with the result.
Up-to-date values of the fields will be returned in the response data.
{% endifmeth %}