Merge pull request #3935 from jladdjr/at-3532-typo

Minor typo fix (pointing workflow node template at unsupported unified job template)
This commit is contained in:
Matthew Jones
2016-11-18 15:45:34 -05:00
committed by GitHub
12 changed files with 23 additions and 23 deletions

View File

@@ -161,7 +161,7 @@ When verifying acceptance we should ensure the following statements are true
Job failures during the time period should be predictable and not catastrophic.
* Node downtime testing should also include recoverability testing. Killing single services and ensuring the system can
return itself to a working state
* Persistent failure should be tested by killing single services in such a way that the cluster node can not be recovered
* Persistent failure should be tested by killing single services in such a way that the cluster node cannot be recovered
and ensuring that the node is properly taken offline
* Network partitioning failures will be important also. In order to test this
- Disallow a single node from communicating with the other nodes but allow it to communicate with the database

View File

@@ -6,7 +6,7 @@ Independent jobs are ran in order of creation time, earliest first. Jobs with de
## Task Manager Architecture
The task manager has a single entry point, `Scheduler().schedule()`. The method may be called in parallel, at any time, as many times as the user wants. The `schedule()` function tries to aquire a single, global, lock using the Instance table first record in the database. If the lock can not be aquired the method returns. The failure to aquire the lock indicates that there is another instance currently running `schedule()`.
The task manager has a single entry point, `Scheduler().schedule()`. The method may be called in parallel, at any time, as many times as the user wants. The `schedule()` function tries to aquire a single, global, lock using the Instance table first record in the database. If the lock cannot be aquired the method returns. The failure to aquire the lock indicates that there is another instance currently running `schedule()`.
### Hybrid Scheduler: Periodic + Event
The `schedule()` function is ran (a) periodically by a celery task and (b) on job creation or completion. The task manager system would behave correctly if ran, exclusively, via (a) or (b). We chose to trigger `schedule()` via both mechanisms because of the nice properties I will now mention. (b) reduces the time from launch to running, resulting a better user experience. (a) is a fail-safe in case we miss code-paths, in the present and future, that change the 3 scheduling considerations for which we should call `schedule()` (i.e. adding new nodes to tower changes the capacity, obscure job error handling that fails a job)