add docs on debugging job event perf

2026-02-02 01:58:09 -03:30 · 2021-06-03 10:27:32 -04:00
parent c34fa30ea7
commit ae5b11a2a9
3 changed files with 254 additions and 0 deletions
--- a/docs/debugging/README.md
+++ b/docs/debugging/README.md
@@ -0,0 +1 @@
+This folder contains documentation related to tooling and methods that are useful for debugging AWX.
--- a/docs/debugging/debugging_job_event_performance.md
+++ b/docs/debugging/debugging_job_event_performance.md
@@ -0,0 +1,253 @@
+# AWX Job Event Performance Debugging
+
+> Observed delay in UI job stdout.
+
+The job event critical path. Job events starts their life in Ansible. Specifically, in the awx Ansible callback plugin. AWX creates JSON payloads out of function calls, and the parameters passed, to the registered callback plugin. The moment the JSON payload is created in the callback plugin is the `created` time on a job event record that is stored in the AWX database. The `modified` time on the job event record is the time the job event record was stored in the database. The difference in `modified - created` is the time it take for an event to go from Ansible and into the AWX database. The pieces in between that path are: (1) Ansible (2) the AWX callback plugin (inside of ansible-runner) (3) push to redis. This ends the processing that happens in the Ansible playbook process space. The work picks back up in the callback receiver process with a (4) pop from redis, (5) bulk insert to postgres, (6) emit event over websocket via redis. Again, the handling of the event changes process space here to (7) deliver over the websocket(s) via daphne to the other AWX nodes & to websocket browser clients. The (8) UI then gets the websocket event and displays it in the job details view. This is what we call the "critical path" for job events.
+
+
+### Quickly Debugging the Path 4-8
+
+* Login to an AWX instance and open chrome debug tools
+* Paste the above in
+* You will notice a blue square on the screen with a textbox and a button.
+* Launch a job and note the job id
+* Enter the job id into the textbox and click submit. This will subscribe to the job events for that job.
+* The blue box should update with the current "lag" time in seconds. The lag time is the `now - created`. This represents the path from 4-8 listed above.
+* Note, use the debug tool while hanging out on something OTHER THAN the job details page (path steps 4-7). Then, hang out on the job details page with the debug page (path steps 4-8) . Note the difference.
+
+### Debugging the callback receiver + database performance
+
+```
+SELECT event, AVG(EXTRACT(EPOCH FROM (modified-created))) as avg_seconds, MAX(EXTRACT(EPOCH FROM (modified-created))) as max_seconds from main_jobevent group by event;
+
+         event          |    avg_seconds     | max_seconds
+------------------------+--------------------+-------------
+ playbook_on_play_start | 0.0160689432989691 |     1.65832
+ playbook_on_start      | 0.0163346391752577 |    1.663806
+ playbook_on_stats      |   17.9721049948454 |   42.367662
+ playbook_on_task_start | 0.0160336769230769 |    1.532999
+ runner_item_on_ok      |  0.164467420394816 |   42.445095
+ runner_on_ok           |  0.223340121179152 |   42.312954
+ runner_on_start        |  0.151972167806445 |    7.748572
+ verbose                |                  0 |           0
+(8 rows)
+```
+
+The above query is useful to get an idea of how fast events are being "put away" by AWX. You don't want to see 10's of second averages. The above times are "normal".
+
+```
+SELECT event, AVG(EXTRACT(EPOCH FROM (modified-created))) as avg_seconds, MAX(EXTRACT(EPOCH FROM (modified-created))) as max_seconds from main_jobevent where job_id >= (select max(job_id) from main_jobevent)-6 group by event;
+```
+
+Above is a modification of the previous query but limiting the set of job event that are considered for averaging to the last 6 jobs that where created. This is useful if you have a recreation scenario and you want to average the event processing time of the last X jobs without the noise of other jobs that have ran on the system in the past.
+
+```
+SELECT * FROM (
+SELECT to_char(start_time, 'YYYY-MM-DD HH24:MI'), count(e.modified) AS events, count(e.modified) / 1 as events_per_minute
+FROM   generate_series(date_trunc('day', localtimestamp - interval '1 hours')
+                     , localtimestamp
+                     , interval '1 min') g(start_time)
+LEFT   JOIN main_jobevent e ON e.modified >= g.start_time
+                   AND e.modified <  g.start_time + interval '1 min'
+GROUP  BY start_time
+ORDER  BY start_time ) as s WHERE events != 0;
+
+     to_char      | events | events_per_minute
+------------------+--------+-------------------
+ 2021-04-28 13:06 |   2878 |              2878
+ 2021-04-28 13:07 |    587 |               587
+(2 rows)
+```
+
+The above is basically a histogram view of the same data provided in `awx-manage callback_stats` . Which is a minute-by-minute history of the rate at which AWX has inserted events into the database. 
+
+### Debugging the Websockets
+
+AWX relies on Django channels and redis for websockets. `channels_redis`, the connection plugin for channels to use redis as the message backend, allows for setting per-group queue limits. By default, AWX sets this limit to 10,000. It can be tricky to know when this limit is exceeded due to the asynchronous nature of channels+asgi+websockets. One way to be notified of websocket queue reaching capacity is to hook into the `channels_redis.core` logger. Below is an example of how to enable the logging and an example of what capacity overflow messages look like.
+
+```
+LOGGING['loggers']['channels_redis.core'] = {
+    'handlers': ['console', 'file', 'tower_warnings'],
+    'level': 'DEBUG'
+}
+
+tail -f /var/log/tower/tower.log
+2021-04-28 20:53:51,230 INFO     channels_redis.core 1 of 4 channels over capacity in group broadcast-group_send
+2021-04-28 20:53:51,231 INFO     channels_redis.core 1 of 1 channels over capacity in group job_events-49
+```
+
+The two above log messages were not chosen randomly. They are representative of common groups that can overflow. Each AWX node sends all events it processes locally to all other AWX nodes. It does this via the `broadcast-group_send` group, all AWX nodes are subscribed to this group. Under high load, this queue can overflow. The other log message above is an overflow in the `job_events-49` group. Overflow in this queue can happen when either AWX or the websocket client can not keep up with the event websocket messages.
+
+```
+redis-cli -s /var/run/redis/redis.sock
+
+redis /var/run/redis/redis.sock> keys *
+1) "awx_dispatcher_statistics"
+2) "callback_tasks"
+3) "broadcast_websocket_stats"
+4) "awx_callback_receiver_statistics_105217"
+5) "asgi:group:broadcast-group_send"
+6) "asgispecific.2061d193ea1c4dd487d8f455dfeabd6a!"
+```
+
+Above is an example of all the keys in redis on an awx node. Let's focus on 3) and 6). 3) is the special group we mentioned above. The redis `ZSET` object is created by `channels_redis` to track django channel clients subscribed to the `broadcast-group_send` group. 6) is the queue that holds websocket messages.
+
+```
+redis /var/run/redis/redis.sock> zrange asgi:group:broadcast-group_send 0 -1
+1) "specific.2061d193ea1c4dd487d8f455dfeabd6a!20e9f507dd78489b89eb2aeb153d3834"
+2) "specific.2061d193ea1c4dd487d8f455dfeabd6a!ea4463175cbb4b04937b98941aae0731"
+3) "specific.2061d193ea1c4dd487d8f455dfeabd6a!8b5249bcb61c4026a9d4e341afe98a56"
+4) "specific.2061d193ea1c4dd487d8f455dfeabd6a!4854fb8c3d36442d95ff41a34fc5ee16"
+
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 58
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 29
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 18
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 3
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 14
+redis /var/run/redis/redis.sock> zcount asgispecific.2061d193ea1c4dd487d8f455dfeabd6a! -inf +inf
+(integer) 15
+```
+
+In the above example we show how to get the list of channels clients subscribed to the `broadcast-group_send` queue. We also show how to get the queue depth of a websocket message queue.
+
+
+
+<details><summary>debug.js</summary>
+<p>
+
+```javascript
+
+// Copy paste the below script in the console to give a visual gauge of events per second received over the websocket
+
+var s;
+
+function listenJob() {
+  var jobid = $('#jobid').val();
+  var xrftoken = readCookie('csrftoken');
+
+  s.send(JSON.stringify({"groups":{"jobs":["status_changed","summary"],"job_events": [jobid,],"control":["limit_reached_1"]},"xrftoken": xrftoken}));
+}
+
+function appendHTML() {
+  $('body').append('<div id="wsdebug_wrapper" style="position:fixed; bottom: 0; left: 0"><div id="wsdebug" width="100%" style="background-color: #ABBAEA; font-size: 48px;">Hello World</div><br><input id="jobid" type="text"><input id="dolisten" type="button" value="Submit" onclick="listenJob()"></div>')
+}
+
+$(document).ready(function()  {
+  appendHTML();
+  debugConnect();
+});
+
+function range_str(start, end) {
+  var res = [];
+  for (const x of Array(end-start).keys()) {
+    res.push((start+x).toString());
+  }
+  return res;
+}
+
+function readCookie(name) {
+    var nameEQ = name + "=";
+    var ca = document.cookie.split(';');
+    for (var i = 0; i < ca.length; i++) {
+        var c = ca[i];
+        while (c.charAt(0) == ' ') c = c.substring(1, c.length);
+        if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length, c.length);
+    }
+    return null;
+}
+
+function debugConnect() {
+  var buff = [];
+  var buff_max = 1024;
+
+  var saved_max = [[0,0,0], [0,0,0], [0,0,0]];
+
+  var AVG_INDEX = 0;
+  var STDEV_INDEX = 1;
+  var MAX_INDEX = 2;
+
+  s = new WebSocket("wss://" + window.location.hostname + ":" + window.location.port +"/websocket/");
+  s.addEventListener('open', function (event) {
+    console.log("Connected to debug websocket");
+  });
+
+  s.addEventListener('message', function (event) {
+    var e = JSON.parse(event.data);
+    if ('created' in e) {
+      var now_seconds = Math.round(+new Date()/1000);
+      var event_ts_seconds = Math.round(Date.parse(e['created'])/1000);
+      var diff = now_seconds - event_ts_seconds;
+
+      buff.push(diff)
+      if (buff.length > buff_max) {
+        buff.shift();
+      }
+
+      var res = buff_calc(buff);
+      var avg = res[0];
+      var stdev = res[1];
+      var max = res[2];
+
+      for (var i=0; i < 3; ++i) {
+        var entry = saved_max[i];
+        if (res[i] > entry[i]) {
+          saved_max[i] = res;
+        }
+      }
+
+      str = "<pre>\n";
+      str += "Lag " + str_vals(res) + "\n";
+      str += "MAX AVERAGE " + str_vals(saved_max[0]) + "\n";
+      str += "MAX STDEV " + str_vals(saved_max[1]) + "\n";
+      str += "MAX MAX " + str_vals(saved_max[2]) + "\n";
+      str += "</pre>";
+
+      $('#wsdebug').html(str);
+    }
+  });
+}
+
+
+
+function buff_calc(buff) {
+  var total = 0;
+
+  var max = 0;
+  for (var i=0; i < buff.length; ++i) {
+    total += buff[i];
+    if (buff[i] > max) {
+      max = buff[i];
+    }
+  }
+  if (total == 0) {
+    total = 1;
+  }
+  var avg = total / buff.length;
+
+  total = 0;
+  for (var i=0; i < buff.length; ++i) {
+    var u = buff[i] - avg;
+    var sq = u*u;
+    total += sq;
+  }
+  if (total == 0) {
+    total = 1;
+  }
+
+  var stdev = Math.sqrt(total / buff.length);
+
+  return [avg, stdev, max];
+}
+
+function str_vals(c) {
+  return "avg " + c[0].toString() + " stdev " + c[1].toString() + " max " + c[2].toString();
+}
+```
+
+</p>
+</details>
--- a/docs/debugging/debugging_misc.md
+++ b/docs/debugging/debugging_misc.md
@@ -0,0 +1,301 @@
+Debugging
+=========
+
+Django Debug Toolbar (DDT)
+----------------
+This is a useful tool for examining SQL queries, performance, headers, requests, signals, cache, logging, and more.  
+
+To enable DDT, you need to set your `INTERNAL_IPS` to the IP address of your load balancer.  This can be overridden by creating a new settings file beginning with `local_` in `awx/settings/` (e.g. `local_overrides.py`).
+This IP address can be found by making a GET to any page on the browsable API and looking for a like this in the standard output:
+```
+awx_1        | 14:42:08 uwsgi.1     | 172.18.0.1 GET /api/v2/tokens/ - HTTP/1.1 200
+```
+
+Allow this IP address by adding it to the `INTERNAL_IPS` variable in your new override local settings file, then navigate to the API and you should see DDT on the
+right side.  If you don't see it, make sure to set `DEBUG=True`.  
+> Note that enabling DDT is detrimental to the performance of AWX and adds overhead to every API request.  It is
+recommended to keep this turned off when you are not using it.  
+
+SQL Debugging
+-------------
+AWX includes a powerful tool for tracking slow queries across all of its Python processes.
+As the AWX user, run:
+
+```
+$ awx-manage profile_sql --threshold 2 --minutes 5
+```
+
+...where threshold is the max query time in seconds, and minutes it the number of minutes to record.
+For the next five minutes (in this example), any AWX Python process that generates a SQL query
+that runs for >2s will be recorded in a `.sqlite` database in `/var/log/tower/profile`.
+
+This is a useful tool for logging all queries at a per-process level, or filtering and searching for
+queries within a certain code branch.  For example, if you observed that certain HTTP requests were
+particularly slow, you could enable profiling, perform the slow request, and then search the log:
+
+```
+$ sqlite3 -column -header /var/log/tower/profile/uwsgi.sqlite
+sqlite> .schema queries
+CREATE TABLE queries (
+    id INTEGER PRIMARY KEY,
+    version TEXT,             # the AWX version
+    pid INTEGER,              # the pid of the process
+    stamp DATETIME DEFAULT CURRENT_TIMESTAMP,
+    argv REAL,                # argv of the process
+    time REAL,                # time to run the query (in seconds)
+    sql TEXT,                 # the actual query
+    explain TEXT,             # EXPLAIN VERBOSE ... of the query
+    bt TEXT                   # python stack trace that led to the query running
+);
+sqlite> SELECT time, sql FROM queries ORDER BY time DESC LIMIT 1;
+time        sql
+----------  ---------------------------------------------------------------------------------------------
+0.046       UPDATE "django_session" SET "session_data" = 'XYZ', "expire_date" = '2019-02-15T21:56:45.693290+00:00'::timestamptz WHERE "django_session"."session_key" = 'we9dumywgju4fulaxz3oki58zpxgmd6t'
+```
+
+Postgres Tracing Slow Queries
+-----------------------------
+Ensure that slow query logging is turned on in the Postgres config and that the log line prefix contains the application name parameter. Below is an example.
+
+```
+log_min_duration_statement = 500 # in ms
+log_line_prefix = '< %m %a >' # timestamp, application name
+```
+
+We've made it easier to correlate postgres connections <--> the processes that is doing the query. For example, the following line is a log entry from a slow running `awx-manage` command. `< 2021-04-21 12:42:10.307 UTC awx_1-1540765-/bin/awx-manage gather_analytics --dry-run -v 3 >LOG:  duration: 1211.270 ms  statement: SELECT MIN("main_jobevent"."id") AS "pk__min", MAX("main_jobevent"."id") AS "pk__max" FROM "main_jobevent" WHERE ("main_jobevent"."modified" >= '2021-03-24T12:42:08.846790+00:00'::timestamptz AND "main_jobevent"."modified" <= '2021-04-21T12:42:08.846790+00:00'::timestamptz)` the entry was found in the log file `/var/lib/pgsql/data/pg_log/postgresql-Wed.log`
+
+Note the `application_name` portion. This allows us to trace the query to the node `awx_1` with processes id `1540765`. The full task command line string gives us context for each long-running query that we need to find the needle in the hay stack without having to go to each individual AWX node and query Linux by the pid to understand what work is being done by each pid.
+```
+awx_1-1540765-/bin/awx-manage gather_analytics --dry-run -v 3`
+<tower_instance_hostname>-<pid>-<pid_launch_string>
+```
+
+This feature is made possible by Postgres. We do this by using the `application_name` field. You can see this in `pg_stat_activity`. Below is an example where we are querying `pg_stat_activity` for sessions that have been alive for more than 5 minutes.
+
+```
+SELECT
+  now() - pg_stat_activity.query_start AS duration,
+  query,
+  state,
+  application_name
+FROM pg_stat_activity
+WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';
+```
+
+```
+    duration     |                                                     query                                                     | state  |                        application_name
+-----------------+---------------------------------------------------------------------------------------------------------------+--------+-----------------------------------------------------------------
+ 00:13:17.430778 | LISTEN "ec2-*-**-***-**.compute-1.amazonaws.com";                                                             | idle   | ec2-*-**-***-**.compute-1.amazonaws.com-1460120-/usr/bin/awx-ma
+ 00:13:17.430826 | LISTEN "awx_1";                                                                                               | idle   | awx_1-1540753-/usr/bin/awx-manage run_dispatcher
+ 00:13:17.202724 | LISTEN "ec2-*-**-***-**.compute-1.amazonaws.com";                                                          | idle   | ec2-*-**-***-**.compute-1.amazonaws.com-1497861-/usr/bin/awx
+ 00:13:13.125703 | COPY (SELECT main_jobevent.id,                                                                               +| active | awx_1-1540765-/bin/awx-manage gather_analytics --dry-run -v 3
+                 |                          main_jobevent.created,                                                              +|        |
+                 |                          main_jobevent.modified,                                                             +|        |
+                 |                          main_jobevent.uuid,                                                                 +|        |
+                 |                          main_jobevent.parent_uuid,                                                          +|        |
+                 |                          main_jobevent.event,                                                                +|        |
+                 |                          main_jobevent.event_data::json->'task_action' AS task_action,                       +|        |
+                 |                          (CASE WHEN event = 'playbook_on_stats' THEN event_data END) as playbook_on_stats,   +|        |
+                 |                          main_jobevent.failed,                                                               +|        |
+                 |                          main_jobevent.changed,                                                              +|        |
+                 |                          main_jobevent.playbook,                                                             +|        |
+                 |                          main_jobevent.play,                                                                 +|        |
+                 |                          main_jobevent.task,                                                                 +|        |
+                 |                          main_jobevent.role,                                                                 +|        |
+                 |                          main_jobevent.job_id,                                                               +|        |
+                 |                          main_jobevent.host_id,                                                              +|        |
+                 |                          main_jobevent.host_name,                                                            +|        |
+                 |                          CAST(main_jobevent.event_data::json->>'start' AS TIMESTAMP WITH TIME ZONE) AS start,+|        |
+                 |                                                                                                               |        |
+ 00:13:13.125703 | COPY (SELECT main_jobevent.id,                                                                               +| active | awx_1-1540765-/bin/awx-manage gather_analytics --dry-run -v 3
+                 |                          main_jobevent.created,                                                              +|        |
+                 |                          main_jobevent.modified,                                                             +|        |
+                 |                          main_jobevent.uuid,                                                                 +|        |
+                 |                          main_jobevent.parent_uuid,                                                          +|        |
+                 |                          main_jobevent.event,                                                                +|        |
+                 |                          main_jobevent.event_data::json->'task_action' AS task_action,                       +|        |
+                 |                          (CASE WHEN event = 'playbook_on_stats' THEN event_data END) as playbook_on_stats,   +|        |
+                 |                          main_jobevent.failed,                                                               +|        |
+                 |                          main_jobevent.changed,                                                              +|        |
+                 |                          main_jobevent.playbook,                                                             +|        |
+                 |                          main_jobevent.play,                                                                 +|        |
+                 |                          main_jobevent.task,                                                                 +|        |
+                 |                          main_jobevent.role,                                                                 +|        |
+                 |                          main_jobevent.job_id,                                                               +|        |
+                 |                          main_jobevent.host_id,                                                              +|        |
+                 |                          main_jobevent.host_name,                                                            +|        |
+                 |                          CAST(main_jobevent.event_data::json->>'start' AS TIMESTAMP WITH TIME ZONE) AS start,+|        |
+                 |                                                                                                               |        |
+                 
+```
+
+Remote Debugging
+----------------
+Python processes in AWX's development environment are kept running in the
+background via supervisord.  As such, interacting with them via Python's
+standard `pdb.set_trace()` isn't possible.
+
+Bundled in our container environment is a remote debugging tool, `sdb`.  You
+can use it to set remote breakpoints in AWX code and debug interactively over
+a telnet session:
+
+```python
+    # awx/main/tasks.py
+
+    class SomeTask(awx.main.tasks.BaseTask):
+
+        def run(self, pk, **kwargs):
+            # This will set a breakpoint and open an interactive Python
+            # debugger exposed on a random port between 6899-6999.  The chosen
+            # port will be reported as a warning in the AWX logs, e.g.,
+            #
+            # [2017-01-30 22:26:04,366: WARNING/Worker-11] Remote Debugger:6900: Please telnet into 0.0.0.0 6900.
+            #
+            # You can access it from your host machine using telnet:
+            #
+            # $ telnet localhost <port>
+            import sdb
+            sdb.set_trace()
+```
+
+Keep in mind that when you interactively debug in this way, any process
+that encounters a breakpoint will wait until an active client is established
+(it won't handle additional tasks) and concludes the debugging session with
+a `continue` command.
+
+To simplify remote debugging session management, AWX's development
+environment comes with tooling that can automatically discover open
+remote debugging sessions and automatically connect to them.  From your *host*
+machine (*i.e.*, _outside_ of the development container), you can run:
+
+```
+sdb-listen
+```
+
+This will open a Python process that listens for new debugger sessions and
+automatically connects to them for you.
+
+Graph Jobs
+----------
+The `awx-manage graph_jobs` can be used to visualize how Jobs progress from
+pending to waiting to running.
+
+```
+awx-manage graph_jobs --help
+usage: awx-manage graph_jobs [-h] [--refresh REFRESH] [--width WIDTH]
+                             [--height HEIGHT] [--version] [-v {0,1,2,3}]
+                             [--settings SETTINGS] [--pythonpath PYTHONPATH]
+                             [--traceback] [--no-color] [--force-color]
+
+Plot pending, waiting, running jobs over time on the terminal
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --refresh REFRESH     Time between refreshes of the graph and data in
+                        seconds (defaults to 1.0)
+  --width WIDTH         Width of the graph (defaults to 100)
+  --height HEIGHT       Height of the graph (defaults to 30)
+```
+
+Below is an example run with 200 Jobs flowing through the system.
+
+[![asciicast](https://asciinema.org/a/xnfzMQ30xWPdhwORiISz0wcEw.svg)](https://asciinema.org/a/xnfzMQ30xWPdhwORiISz0wcEw)
+=======
+Code Instrumentation Profiling
+------------------------------
+Decorate a function to generate profiling data that will tell you the percentage
+of time spent in branches of a code path. This comes at an absolute performance
+cost. However, the relative numbers are still very helpful.
+
+Requirements for `dot_enabled=True`
+**Note:** The profiling code will run as if `dot_enabled=False` when `gprof2dot`
+package is not found
+```
+/var/lib/awx/venv/awx/bin/pip3 install gprof2dot
+```
+
+Below is the signature of the `@profile` decorator.
+```
+@profile(name, dest='/var/log/tower/profile', dot_enabled=True)
+```
+
+```
+from awx.main.utils.profiling import profile
+
+@profile(name="task_manager_profile")
+def task_manager():
+    ...
+```
+
+Now, invoke the function being profiled. Each run of the profiled function
+will result in a file output to `dest` containing the profile data summary as
+well as a dot graph if enabled. The profile data summary can be viewed in a
+text editor. The dot graph can be viewed using `xdot`.
+
+```
+bash-4.4$ ls -aln /var/log/tower/profile/
+total 24
+drwxr-xr-x 2 awx  root 4096 Oct 15 13:23 .
+drwxrwxr-x 1 root root 4096 Oct 15 13:23 ..
+-rw-r--r-- 1 awx  root  635 Oct 15 13:23 2.001s-task_manager_profile-2303-272858af-3bda-45ec-af9e-7067aa86e4f3.dot
+-rw-r--r-- 1 awx  root  587 Oct 15 13:23 2.001s-task_manager_profile-2303-272858af-3bda-45ec-af9e-7067aa86e4f3.pstats
+-rw-r--r-- 1 awx  root  632 Oct 15 13:23 2.002s-task_manager_profile-2303-4cdf4660-3ef4-4238-8164-33611822d9e3.dot
+-rw-r--r-- 1 awx  root  587 Oct 15 13:23 2.002s-task_manager_profile-2303-4cdf4660-3ef4-4238-8164-33611822d9e3.pstats
+```
+
+```
+xdot /var/log/tower/profile/2.001s-task_manager_profile-2303-272858af-3bda-45ec-af9e-7067aa86e4f3.dot
+```
+
+
+Code Instrumentation Timing
+---------------------------
+Similar to the profiling decorator, there is a timing decorator. This is useful
+when you do not want to incur the overhead of profiling and want to know the
+accurate absolute timing of a code path.
+
+Below is the signature of the `@timing` decorator.
+```
+@timing(name, dest='/var/log/tower/timing')
+```
+
+```
+from awx.main.utils.profiling import timing
+
+@timing(name="my_task_manager_timing")
+def task_manager():
+    ...
+```
+
+Now, invoke the function being timed. Each run of the timed function will result
+in a file output to `dest`. The timing data will be in the file name.
+
+```
+bash-4.4# ls -aln
+total 16
+drwxr-xr-x 2 0 0 4096 Oct 20 12:43 .
+drwxrwxr-x 1 0 0 4096 Oct 20 12:43 ..
+-rw-r--r-- 1 0 0   61 Oct 20 12:43 2.002178-seconds-my_task_manager-ab720a2f-4624-47d0-b897-8549fe7e8c99.time
+-rw-r--r-- 1 0 0   60 Oct 20 12:43 2.00228-seconds-my_task_manager-e8a901be-9cdb-4ffc-a34a-a6bcb4266e7c.time
+```
+
+The class behind the decorator can also be used for profiling.
+
+```
+from awx.main.utils.profiling import AWXProfiler
+
+prof = AWXProfiler("hello_world")
+prof.start()
+'''
+code to profile here
+'''
+prof.stop()
+
+
+# Note that start() and stop() can be reused. An new profile file will be output.
+prof.start()
+'''
+more code to profile
+'''
+prof.stop()
+```
				`@@ -0,0 +1 @@`
				`This folder contains documentation related to tooling and methods that are useful for debugging AWX.`