Feature: retry on subset of jobs hosts

2026-03-18 09:27:31 -02:30 · 2017-10-16 09:58:56 -04:00
parent f1813c35ed
commit 0ae9283fba
7 changed files with 176 additions and 4 deletions
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -53,3 +53,4 @@
  `deprovision_node` -> `deprovision_instance`, and `instance_group_remove` -> `remove_from_queue`,
  which backward compatibility support for 3.1 use pattern
  [[#6915](https://github.com/ansible/ansible-tower/issues/6915)]
+* Allow relaunching jobs on a subset of hosts, by status.[[#219](https://github.com/ansible/awx/issues/219)]
--- a/docs/retry_by_status.md
+++ b/docs/retry_by_status.md
@@ -0,0 +1,76 @@
+# Relaunch on Hosts with Status
+
+This feature allows the user to relaunch a job, targeting only a subset
+of hosts that had a particular status in the prior job.
+
+### API Design of Relaunch
+
+#### Basic Relaunch
+
+POST to `/api/v2/jobs/N/relaunch/` without any request data should relaunch
+the job with the same `limit` value that the original job used, which
+may be an empty string.
+
+#### Relaunch by Status
+
+Providing request data containing `{"hosts": "<status>"}` should change
+the `limit` of the relaunched job to target the hosts matching that status
+from the previous job (unless the default option of "all" is used).
+The options and meanings of `<status>` include:
+
+ - all: relaunch without changing the job limit
+ - ok: relaunch against all hosts with >=1 tasks that returned the "ok" status
+ - changed: relaunch against all hosts with >=1 tasks had a changed status
+ - failed: relaunch against all hosts with >=1 tasks failed plus all unreachable hosts
+ - unreachable: relaunch against all hosts with >=1 task when they were unreachable
+
+These correspond to the playbook summary states from a playbook run, with
+the notable exception of "failed" hosts. Ansible does not count an unreachable
+event as a failed task, so unreachable hosts can (and often do) have no
+associated failed tasks. The "failed" status here will still target both
+status types, because Ansible will mark the _host_ as failed and include it
+in the retry file if it was unreachable.
+
+### Relaunch Endpoint
+
+Doing a GET to the relaunch endpoint should return additional information
+regarding the host summary of the last job. Example response:
+
+```json
+{
+    "passwords_needed_to_start": [],
+    "retry_counts": {
+        "all": 30,
+        "failed": 18,
+        "ok": 25,
+        "changed": 4,
+        "unreachable": 9
+    }
+}
+```
+
+If the user launches, providing a status for which there were 0 hosts,
+then the request will be rejected.
+
+# Acceptance Criteria
+
+Scenario: user launches a job against host "foobar", and the run fails
+against this host. User changes name of host to "foo", and relaunches job
+against failed hosts. The `limit` of the relaunched job should reference
+"foo" and not "foobar".
+
+The user should be able to provide passwords on relaunch, while also
+running against hosts of a particular status.
+
+Not providing the "hosts" key in a POST to the relaunch endpoint should
+relaunch the same way that relaunching has previously worked.
+
+If a playbook provisions a host, this feature should behave reasonably
+when relaunching against a status that includes these hosts.
+
+Feature should work even if hosts have tricky characters in their names,
+like commas.
+
+Also need to consider case where a task `meta: clear_host_errors` is present
+inside a playbook, and that the retry subset behavior is the same as Ansible
+for this case.