Commit Graph

492 Commits

Author SHA1 Message Date
Bill Nottingham
73b9d25371 Remove gather interval setting
This is a) the wrong settings key name
b) the same as the default in awx/main/conf.py anyway.
2020-05-18 17:55:07 -04:00
Bill Nottingham
73b0506e96 Remove obsolete setting.
This hasn't been used for years now.
2020-05-18 17:41:15 -04:00
Ryan Petrello
c7bb5a3e7b Merge branch 'downstream' into devel 2020-05-15 12:38:47 -04:00
beeankha
479ab8550d Fix misc. linter errors 2020-05-14 15:43:50 -04:00
Seth Foster
4da0e0dd80 Vendor collections for isolated jobs to work in ansible 2.10
kubectl and synchronize are now part of community.kubernetes
and ansible.posix collections, respectively. This change installs
these collections to a local directory to be used in inventory and
isolated management playbooks.

awx issue #6930
2020-05-13 10:41:01 -04:00
chris meyers
216454d298 cleanup channel groups on start
* There are 2 data-structures that django channels redis uses: (1) zset
and (2) list. (1) is used for group membership where the key is the
logic user group and the value(s) are websocket clients. The score of
the zset entry is used for group expiration. We can not rely on group
expiration for clean-up because there is no interface privided by redis
channels to refresh the expiration. Choosing a small value for
group_expiry could result on our websocket backplane group expiring,
which would result in job events not being delivered. Instead, we
increase the group expiration to 5 years and clean up on daphne service
start.
* The list (2) data-structure is used by django channels redis to queue
websocket events per-websocket-client as needed. The need arises to
queue per-websocket-client events when the consumer can not keep up with
the producer. The consumer here is daphne, the producer is AWX.
* When AWX is operating healthy group membership in Redis is reflective
of the real-world. When AWX is unhealthy i.e. daphne cycles, the zset
will contain stale websocket client entries. This can be observed by
running `zrange asgi::group:jobs-status_changed 0 -1`. If the entries
returned look like:
specific.fUkXXpYj!DKOIfwPICNgw
specific.fUkXXpYj!FQcdopZeiRdG
specific.lpTSAgnk!IOKldfzcfdDp
specific.lpTSAgnk!NbvRUZsDpIQx
The entries with `fUkXXpYj` are stale. Note that this changeset fixes
this by removing all `asgi:*` entries on daphne start.
* Also note that individual message themselves have an expiration that
is configurable and defaults to 60.
* Also note that zset's tracking group membership will be deleted by
django channels redis when they are empty.
2020-05-11 11:02:57 -04:00
Ryan Petrello
e51d0b6fde add a setting for enabling high rsyslogd verbosity 2020-04-24 14:01:17 -04:00
softwarefactory-project-zuul[bot]
c0e07198cf Merge pull request #6283 from AlanCoding/vendoring_collections
Use vendored collections for inventory imports

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2020-04-23 18:54:50 +00:00
Ryan Petrello
9e30f004d3 let users configure the destination and max disk size of rsyslogd spool 2020-04-20 19:12:28 -04:00
chris meyers
8592bf3e39 better broadcast websocket logging
* Make quiter the daphne logs by raising the level to INFO instead of
DEBUG
* Output the django channels name of broadcast clients. This way, if the
queue gets backed up, we can find it in redis.
2020-04-17 17:19:08 -04:00
AlanCoding
fcf75af6a7 Get current cloud sources working from collection
update test data files

Adopt official vendor location

openstack not published yet

Add collections to show paths

Add collections loc to installer settings

Add vendored collections to show path again
2020-04-16 20:55:59 -04:00
softwarefactory-project-zuul[bot]
eba0e4fd77 Merge pull request #6710 from rooftopcellist/rsyslog_rename_dir
Rename awx rsyslog socket and PID dir

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2020-04-15 20:43:40 +00:00
Christian Adams
c8ceb62269 Rename awx rsyslog socket and PID dir 2020-04-15 14:11:15 -04:00
chris meyers
daa312d7ee log file for wsbroadcast 2020-04-14 14:21:23 -04:00
Christian Adams
7fd79b8e54 Remove unneeded logging sock variable 2020-04-13 11:43:59 -04:00
Christian Adams
996d7ce054 Move supervisor and rsyslog sock files to their own dirs under /var/run 2020-04-13 11:43:59 -04:00
Shane McDonald
c0af3c537b Configure rsyslog to listen over a unix domain socket instead of a port
- Add a placeholder rsyslog.conf so it doesn't fail on start
 - Create access restricted directory for unix socket to be created in
 - Create RSyslogHandler to exit early when logging socket doesn't exist
 - Write updated logging settings when dispatcher comes up and restart rsyslog so they  take effect
 - Move rsyslogd to the web container and create rpc supervisor.sock
 - Add env var for supervisor.conf path
2020-04-13 11:43:59 -04:00
Ryan Petrello
589d27c88c POC: replace our external log aggregation feature with rsyslog
- this change adds rsyslog (https://github.com/rsyslog/rsyslog) as
  a new service that runs on every AWX node (managed by supervisord)
  in particular, this feature requires a recent version (v8.38+) of
  rsyslog that supports the omhttp module
  (https://github.com/rsyslog/rsyslog-doc/pull/750)
- the "external_logger" handler in AWX is now a SysLogHandler that ships
  logs to the local UDP port where rsyslog is configured to listen (by
  default, 51414)
- every time a LOG_AGGREGATOR_* setting is changed, every AWX node
  reconfigures and restarts its local instance of rsyslog so that its
  fowarding settings match what has been configured in AWX
- unlike the prior implementation, if the external logging aggregator
  (splunk/logstash) goes temporarily offline, rsyslog will retain the
  messages and ship them when the log aggregator is back online
- 4xx or 5xx level errors are recorded at /var/log/tower/external.err
2020-04-13 11:43:59 -04:00
chris meyers
7433aab258 switch memcached from tcp to unix domain socket 2020-04-06 08:35:12 -04:00
Ryan Petrello
9fe2211f82 get rid of the activity stream middleware
it has bugs and is very confusing

see: https://github.com/ansible/tower/issues/4037
2020-04-01 16:02:42 -04:00
Ryan Petrello
301d6ff616 make the job event bigint migration chunk size configurable 2020-03-27 09:28:10 -04:00
chris meyers
770b457430 redis socket support 2020-03-18 16:10:19 -04:00
chris meyers
d6594ab602 add broadcast websocket metrics
* Gather brroadcast websocket metrics and push them into redis every
configurable seconds.
* Pop metrics from redis in web view layer to display via the api on
demand
2020-03-18 16:10:18 -04:00
chris meyers
b6b9802f9e increase per-channel capacity
* 100 is the default capacity for a channel. If the client doesn't read
the socket fast enough, websocket messages can and will be lost. This
increases the default to 10,000
2020-03-18 16:10:18 -04:00
chris meyers
3b9e67ed1b remove channel group model
* Websocket user session <-> group subscription membership now resides
in Redis rather than the database.
2020-03-18 16:10:18 -04:00
chris meyers
3c5c9c6fde move broadcast websocket out into its own process 2020-03-18 16:10:18 -04:00
chris meyers
03b73027e8 websockets aware of Instance changes
* New tower nodes that are (de)registered in the Instance table are seen
by the websocket layer and connected to or disconnected from by the
websocket broadcast backplane using a polling mechanism.
* This is especially useful for openshift and kubernetes. This will be
useful for standalone Tower in the future when the restarting of Tower
services is not required.
2020-03-18 16:10:17 -04:00
chris meyers
be58906aed remove kombu 2020-03-18 16:10:17 -04:00
chris meyers
dc6c353ecd remove support for multi-reader dispatch queue
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
2020-03-18 16:10:16 -04:00
chris meyers
2a2c34f567 combine all the broker replacement pieces
* local redis for event processing
* postgres for message broker
* redis for websockets
2020-03-18 16:10:15 -04:00
chris meyers
c8eeacacca POC channels 2 2020-03-18 16:10:12 -04:00
Ryan Petrello
38a08d163c get rid of celery/celerybeat
alternative to https://github.com/ansible/awx/pull/2530 which makes use
of https://pypi.org/project/schedule/

this doesn't have support for any persistence (like how celery beat uses
a shelve file), because all of our periodic jobs run at most every few
minutes
2020-02-10 17:32:02 -05:00
Ryan Petrello
78b00652bd add the ability to enable profiling for the callback receiver workers 2020-01-27 12:03:53 -05:00
softwarefactory-project-zuul[bot]
23b2b136d6 Merge pull request #5707 from AlanCoding/bulk_create_logs
Allow CTiT log level to log bulk_create lines

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2020-01-22 15:04:17 +00:00
Bill Nottingham
44e176dde8 Change how analytics is gathered to only gather once per interval 2020-01-21 11:40:51 -05:00
Ben Thomasson
652a428438 Fixes crontab for gather_analytics to run once every 4 hours 2020-01-20 13:30:10 -05:00
AlanCoding
ceed6f8d9b Allow CTiT log level to log bulk_create lines 2020-01-20 12:41:10 -05:00
Ryan Petrello
306f504fb7 optimize the callback receiver to buffer writes on high throughput
additionaly, optimize away several per-event host lookups and
changed/failed propagation lookups

we've always performed these (fairly expensive) queries *on every event
save* - if you're processing tens of thousands of events in short
bursts, this is way too slow

this commit also introduces a new command for profiling the insertion
rate of events, `awx-manage callback_stats`

see: https://github.com/ansible/awx/issues/5514
2020-01-14 12:04:26 -05:00
Seth Foster
7873d08311 Update pip and setuptools in requirements txt
Versions selected to be pre-19 pip
due to unresolved issues with the build systems

Upgrade everything, party on

document new process

rotate license files

fix Swagger schema generation target

Remove --ignore-installed flag
2020-01-07 17:14:32 -06:00
Ryan Petrello
4a6147d4c2 add the ability to generate dot graphs for per-request profiling 2020-01-04 07:09:42 -05:00
Graham Mainwaring
055c02072f Default LOGIN_REDIRECT_URL should be blank, not null 2019-12-18 15:13:46 -05:00
Graham Mainwaring
7700050d10 Add default for LOGIN_REDIRECT_OVERRIDE 2019-12-11 17:21:02 -05:00
Alan Rominger
8e7d607a47 Only turn off Galaxy cert verification via toggle (#3933) 2019-11-19 08:54:40 -05:00
Bill Nottingham
5cdf2f88da Remove admin alerts, there are better mechanisms for this 2019-10-30 19:35:45 -04:00
Graham Mainwaring
b2b33605cc Add UI toggle to disable public Galaxy (#3867) 2019-10-29 11:24:11 -04:00
Ryan Petrello
16812542f8 implement a simple periodic pod reaper for container groups
see: https://github.com/ansible/awx/issues/4911
2019-10-17 17:06:36 -04:00
Shane McDonald
8f75382b81 Implement retry logic for container group pod launches 2019-10-10 15:53:56 -04:00
Christian Adams
844b8a803f update frequency of collection for automation analytics 2019-10-09 21:10:48 -04:00
Ryan Petrello
a076e84a33 add a settings flag for writing all external logs to disk 2019-10-09 13:54:54 -04:00
AlanCoding
c09039e963 Add setting for auth_url
Also adjust public galaxy URL setting to
allow using only the primary Galaxy server

Include auth_url in token exclusivity validation
2019-10-07 14:02:43 -04:00