19 Commits

Author SHA1 Message Date
Chris Meyers
7b3fb2c2a8 Add example grafana dashboard
* Per-service log view
2024-05-31 13:55:17 -04:00
Chris Meyers
0eb465531c Centralized logging via otel 2024-05-31 13:55:17 -04:00
Chris Meyers
ae1235b223 Rename container hostname from awx_1 to awx-1
* Django and other webservers that care about proper hostnames don't
  like underscores in them.
2024-04-03 15:58:17 -04:00
Alan Rominger
ef99770383
Add subsystem metrics for the dispatcher (#13989)
This adds a handful of metrics to /api/v2/metrics/ recorded from the dispatcher main process

Adds logic in the dispatcher period tasks to calculate these for the last collection interval
Reports worker count, task count, scale up events, and availability

Add data to demo grafana dashboard
2023-05-17 14:29:31 -04:00
Elijah DeLee
d50c97ae22
Updates to Grafana Dashboard and example alerts
More fun in the grafana dashboard. The rows organize the panels and are
collapsable. Also, tested with multiple nodes and fixed some
labeling issues when there are more than one node.

Update grafana alerting readme info and some fun prose about one of the
alerts as well as some reorganizing of the code for clarity.

finally, drop the time to fire for alerts because it's better to have them be a bit touchy so users can verify they work vs. not being sure.
2022-10-11 11:14:22 -04:00
Elijah DeLee
8333b0cf66
fix name to be consistent (#12975)
* fix name to be consistent

this is not a mean, its the last value
so say that in the name

* add remaining capacity to dashboard

also make legends pretty with nice names
2022-09-29 16:52:12 -04:00
Elijah DeLee
d9f5193a18
move grafana/prometheus docs to own README (#12960)
* move grafana/prometheus docs to own README
2022-09-28 14:05:05 -04:00
Rebeccah
eaad749cc9
I broke grafana with my rename, so now I'm fixing it, and adding a better name in overall that is less focused on alerts. 2022-09-27 11:58:43 -04:00
Rebeccah
88f0ab0233
add new alert rule for when error rate is over a certain rate, also fix
typo in URL and in grafana alert rule

Important learning: no newlines in rules/equations

turns out datasourceUid can be set in prometheus_source.yml, and it can be anything we want. So I have set it to awx_alert, the PBFAnumbersetc value it was set to before was an autogenerated UID, and it would actually work just with that generated value, but because we want it to make sense, we're setting the value in prometheus_source.yml

finally, update the docs to be reflective of grafana docs and how to export new rules a user might want to add.

Co-authored-by: Elijah DeLee <kdelee@redhat.com>
2022-09-23 15:05:57 -04:00
Elijah DeLee
461b5221f3 Add graphs for job event processing to dashboard 2022-09-14 16:23:53 -04:00
Elijah DeLee
10d06f219d add alerting rule to grafana
This rule alerts if the redis queue is larger than what the rolling
average event insertion rate/second * 120. In other words, if the redis
queue is larger than it appears we can process events in two minutes.

It appears it has to meet this condition for 60 seconds to start firing.

Future commits will address how to configure contact points like slack.

shout out to @jainnikhil30 and @rebeccahhh who figured this out in jam
session this morning.
2022-09-14 16:23:53 -04:00
Alan Rominger
53de245877
Fix LDAP volume conditional, better metrics interval 2022-09-04 22:33:12 -04:00
Alan Rominger
ccbc8ce7de
Make the metrics default sampling interval 5s 2022-09-02 13:38:49 -04:00
Elijah DeLee
125801ec5b add panel to grafana dashboard for capacity
also reorganize so there are two columns of panels, not
just one long skinny set of panels
2022-08-26 15:42:40 -04:00
Alan Rominger
11e63e2e89
Remove an old metrics field and add a new one to dashboard 2022-08-16 22:37:27 -04:00
Alan Rominger
f6da9a5073
Add more graphs for task manager refactor 2022-08-15 15:29:34 -04:00
Alan Rominger
3aa8320fc7
Add a graph to show database connections being used 2022-07-28 11:52:36 -04:00
Seth Foster
2f82b75748
Add subsystem metrics for task manager 2022-06-14 11:00:11 -04:00
Seth Foster
6f68f3cba6
Add make prometheus and make grafana commands to dev environment 2022-05-31 17:07:15 -04:00