There is some history here.
https://github.com/ansible/awx/pull/7190 <- This PR was an attempt at fixing a
bug notting ran into where some jobs on k8s installs would get stuck in Waiting
forever.
The PR mentioned above introduced a bug where there are no instance groups on a
fresh k8s-based install. This is because this process currently happens in the
launch scripts, before the database is up.
With this patch, queue / instance group registration happens in the heartbeat,
right after auto-registering the instance.
update test data files
Adopt official vendor location
openstack not published yet
Add collections to show paths
Add collections loc to installer settings
Add vendored collections to show path again
* We log stats using a safe hostname because of prometheus requirements.
However, when we display users the hostname we should use the Instance
hostname. This change outputs the Instance.hostname instead of the safe
prometheus name.
- this change adds rsyslog (https://github.com/rsyslog/rsyslog) as
a new service that runs on every AWX node (managed by supervisord)
in particular, this feature requires a recent version (v8.38+) of
rsyslog that supports the omhttp module
(https://github.com/rsyslog/rsyslog-doc/pull/750)
- the "external_logger" handler in AWX is now a SysLogHandler that ships
logs to the local UDP port where rsyslog is configured to listen (by
default, 51414)
- every time a LOG_AGGREGATOR_* setting is changed, every AWX node
reconfigures and restarts its local instance of rsyslog so that its
fowarding settings match what has been configured in AWX
- unlike the prior implementation, if the external logging aggregator
(splunk/logstash) goes temporarily offline, rsyslog will retain the
messages and ship them when the log aggregator is back online
- 4xx or 5xx level errors are recorded at /var/log/tower/external.err
The commit is intended to speed up the cleanup_jobs command in awx. Old
methods takes 7+ hours to delete 1 million old jobs. New method takes
around 6 minutes.
Leverages a sub-classed Collector, called AWXCollector, that does not
load in objects before deleting them. Instead querysets, which are
lazily evaluated, are used in places where Collector normally keeps a
list of objects.
Finally, a couple of tests to ensure parity between old Collector and
AWXCollector. That is, any object that is updated/removed from the
database using Collector should be have identical operations using
AWXCollector.
tower issue 1103
* Gather brroadcast websocket metrics and push them into redis every
configurable seconds.
* Pop metrics from redis in web view layer to display via the api on
demand
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
I have a hunch that our usage of a daemon thread is causing import lock
contention related to https://github.com/ansible/awx/issues/5617
We've encountered similar issues before with threads across dispatcher
processes at fork time, and cpython has had bugs like this in recent
history:
https://bugs.python.org/issue38884
My gut tells me this might be related.
The prior implementation - based on celerybeat - ran its code in
a process (not a thread), and the timing of that merge matches the
period of time we started noticing issues.
Currently testing it to see if it resolves some of the issues we're
seeing.
additionaly, optimize away several per-event host lookups and
changed/failed propagation lookups
we've always performed these (fairly expensive) queries *on every event
save* - if you're processing tens of thousands of events in short
bursts, this is way too slow
this commit also introduces a new command for profiling the insertion
rate of events, `awx-manage callback_stats`
see: https://github.com/ansible/awx/issues/5514