Commit Graph

666 Commits

Author SHA1 Message Date
Jim Ladd
e1f7a7619f Merge pull request #4398 from jladdjr/instance_id_fallback
Instance id fallback
2020-06-11 12:19:23 -07:00
Shane McDonald
85deb8711c Add queue / instance group registration to heartbeat for k8s installs
There is some history here.

https://github.com/ansible/awx/pull/7190 <- This PR was an attempt at fixing a
bug notting ran into where some jobs on k8s installs would get stuck in Waiting
forever.

The PR mentioned above introduced a bug where there are no instance groups on a
fresh k8s-based install. This is because this process currently happens in the
launch scripts, before the database is up.

With this patch, queue / instance group registration happens in the heartbeat,
right after auto-registering the instance.
2020-06-10 16:55:27 -04:00
AlanCoding
1dd9772e41 Allow use of fallback instance_ids 2020-06-09 22:51:42 -07:00
beeankha
85426f76a5 Fix misc. linter errors due to the flake8-3.8.1 release
- [Ref] https://flake8.pycqa.org/en/latest/release-notes/
2020-05-29 17:58:27 -04:00
AlanCoding
fcf75af6a7 Get current cloud sources working from collection
update test data files

Adopt official vendor location

openstack not published yet

Add collections to show paths

Add collections loc to installer settings

Add vendored collections to show path again
2020-04-16 20:55:59 -04:00
chris meyers
1acca459ef nice error message when redis is down
* awx_manage run_wsbroadcast --status nice error message if someone
failed to start awx services (i.e. redis)
2020-04-15 13:28:13 -04:00
chris meyers
63f56d33aa show user unsafe name
* We log stats using a safe hostname because of prometheus requirements.
However, when we display users the hostname we should use the Instance
hostname. This change outputs the Instance.hostname instead of the safe
prometheus name.
2020-04-14 16:59:34 -04:00
chris meyers
9cabf3ef4d do not include iso nodes in wsbroadcast status 2020-04-14 16:55:56 -04:00
Ryan Petrello
589d27c88c POC: replace our external log aggregation feature with rsyslog
- this change adds rsyslog (https://github.com/rsyslog/rsyslog) as
  a new service that runs on every AWX node (managed by supervisord)
  in particular, this feature requires a recent version (v8.38+) of
  rsyslog that supports the omhttp module
  (https://github.com/rsyslog/rsyslog-doc/pull/750)
- the "external_logger" handler in AWX is now a SysLogHandler that ships
  logs to the local UDP port where rsyslog is configured to listen (by
  default, 51414)
- every time a LOG_AGGREGATOR_* setting is changed, every AWX node
  reconfigures and restarts its local instance of rsyslog so that its
  fowarding settings match what has been configured in AWX
- unlike the prior implementation, if the external logging aggregator
  (splunk/logstash) goes temporarily offline, rsyslog will retain the
  messages and ship them when the log aggregator is back online
- 4xx or 5xx level errors are recorded at /var/log/tower/external.err
2020-04-13 11:43:59 -04:00
chris meyers
9c6e42fd1b fix spelling mistake in wsbroadcast status output 2020-04-13 09:37:32 -04:00
softwarefactory-project-zuul[bot]
00aa1ad295 Merge pull request #6553 from ryanpetrello/remove-manual-inv-source-for-good
remove deprecated manual inventory source support

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2020-04-03 18:09:36 +00:00
Ryan Petrello
8b00b8c9c2 remove deprecated legacy manual inventory source support
see: https://github.com/ansible/awx/issues/6309
2020-04-03 10:54:43 -04:00
chris meyers
8bbae0cc3a color output of ws broadcast connection status 2020-04-02 21:46:12 -04:00
chris meyers
c00f1505d7 add broadcast websocket status command 2020-04-02 21:46:12 -04:00
chris meyers
3326979806 fix register_queue race conditionn
* Avoid race condition with `apply_cluster_membership_policies`
2020-03-27 16:15:10 -04:00
Ryan Petrello
8f1db173c1 remove a bunch of RabbitMQ references 2020-03-24 18:46:58 -04:00
chris meyers
e9021bd173 serialize register_queue
* also remove uneeded query
2020-03-23 07:21:17 -04:00
Seth Foster
88fb30e0da Delete jobs without loading objects first
The commit is intended to speed up the cleanup_jobs command in awx. Old
methods takes 7+ hours to delete 1 million old jobs. New method takes
around 6 minutes.

Leverages a sub-classed Collector, called AWXCollector, that does not
load in objects before deleting them. Instead querysets, which are
lazily evaluated, are used in places where Collector normally keeps a
list of objects.

Finally, a couple of tests to ensure parity between old Collector and
AWXCollector. That is, any object that is updated/removed from the
database using Collector should be have identical operations using
AWXCollector.

tower issue 1103
2020-03-19 14:14:02 -04:00
chris meyers
093d204d19 fix flake8 2020-03-18 16:10:19 -04:00
chris meyers
d6594ab602 add broadcast websocket metrics
* Gather brroadcast websocket metrics and push them into redis every
configurable seconds.
* Pop metrics from redis in web view layer to display via the api on
demand
2020-03-18 16:10:18 -04:00
chris meyers
3c5c9c6fde move broadcast websocket out into its own process 2020-03-18 16:10:18 -04:00
chris meyers
f5193e5ea5 resolve rebase errors 2020-03-18 16:10:17 -04:00
chris meyers
be58906aed remove kombu 2020-03-18 16:10:17 -04:00
chris meyers
dc6c353ecd remove support for multi-reader dispatch queue
* Under the new postgres backed notify/listen message queue, this never
actually worked. Without using the database to store state, we can not
provide a at-most-once delivery mechanism w/ multi-readers.
* With this change, work is done ONLY on the node that requested for the
work to be done. Under rabbitmq, the node that was first to get the
message off the queue would do the work; presumably the least busy node.
2020-03-18 16:10:16 -04:00
chris meyers
2a2c34f567 combine all the broker replacement pieces
* local redis for event processing
* postgres for message broker
* redis for websockets
2020-03-18 16:10:15 -04:00
chris meyers
558e92806b POC postgres broker 2020-03-18 16:10:15 -04:00
chris meyers
355fb125cb redis events 2020-03-18 16:10:15 -04:00
chris meyers
c8eeacacca POC channels 2 2020-03-18 16:10:12 -04:00
Ryan Petrello
5364e78397 switch the periodic scheduler to a child process (instead of a thread)
I have a hunch that our usage of a daemon thread is causing import lock
contention related to https://github.com/ansible/awx/issues/5617
We've encountered similar issues before with threads across dispatcher
processes at fork time, and cpython has had bugs like this in recent
history:

https://bugs.python.org/issue38884

My gut tells me this might be related.

The prior implementation - based on celerybeat - ran its code in
a process (not a thread), and the timing of that merge matches the
period of time we started noticing issues.

Currently testing it to see if it resolves some of the issues we're
seeing.
2020-02-27 12:15:15 -05:00
Bill Nottingham
3e6b6c05a6 Remove the rax support specified in the linked TODO 2020-02-25 15:03:05 -05:00
beeankha
11ccfd8449 Fix misc. linting errors 2020-02-12 12:34:15 -05:00
Ryan Petrello
38a08d163c get rid of celery/celerybeat
alternative to https://github.com/ansible/awx/pull/2530 which makes use
of https://pypi.org/project/schedule/

this doesn't have support for any persistence (like how celery beat uses
a shelve file), because all of our periodic jobs run at most every few
minutes
2020-02-10 17:32:02 -05:00
AlanCoding
3bbce18173 Remove computed fields artifacts no longer used
Remove deleted field from notification payload
2020-02-04 20:23:37 -05:00
Ryan Petrello
f9af5e8959 optimize awx-manage callback_stats for larger datasets
to monitor this historically, we'd probably need to introduce a new
index on the modified column of all our event types
2020-01-22 16:52:38 -05:00
softwarefactory-project-zuul[bot]
aa5532f7b5 Merge pull request #5665 from wenottingham/warn-only
Only warn when license is exceeded non-fatally

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2020-01-15 20:16:13 +00:00
Bill Nottingham
bc5ef7f1c8 Only warn when license is exceeded non-fatally 2020-01-15 10:05:20 -05:00
Ryan Petrello
306f504fb7 optimize the callback receiver to buffer writes on high throughput
additionaly, optimize away several per-event host lookups and
changed/failed propagation lookups

we've always performed these (fairly expensive) queries *on every event
save* - if you're processing tens of thousands of events in short
bursts, this is way too slow

this commit also introduces a new command for profiling the insertion
rate of events, `awx-manage callback_stats`

see: https://github.com/ansible/awx/issues/5514
2020-01-14 12:04:26 -05:00
Jake McDermott
d91e72c23f Generate new uuid for newly registered iso nodes
When provisioning a new isolated node, generate a new uuid instead of
reusing the SYSTEM_UUID of the controller node.
2020-01-03 12:59:57 -05:00
Ryan Petrello
7396e2e7ac add an awx-manage command for re-generating SECRET_KEY 2019-12-12 16:19:20 -05:00
Ryan Petrello
70979df36a prevent the creation of Host names that contain Jinja 2019-11-13 13:15:36 -05:00
Ryan Petrello
d01088d33e Revert "add support for awx-manage run_callback_receiver --status" 2019-10-18 09:49:02 -04:00
Graham Mainwaring
a038f9fd78 Merge pull request #3845 from ghjm/gather_analytics_dry_run
Add a --dry-run option to gather analytics locally, even if analytics is disabled in settings.
2019-10-17 16:17:18 -04:00
Graham Mainwaring
7dd241fcff Add a --dry-run option to gather analytics locally, even if analytics is disabled in settings. 2019-10-17 13:54:13 -04:00
Ryan Petrello
ffb1707e74 add support for awx-manage run_callback_receiver --status 2019-10-17 11:10:27 -04:00
Christian Adams
c0fd70f189 add mgmt cmd to check db connection 2019-10-01 15:40:43 -04:00
softwarefactory-project-zuul[bot]
b858001c8f Merge pull request #4851 from ryanpetrello/fix-host-key-checking
improve host key checking configurability

Reviewed-by: https://github.com/apps/softwarefactory-project-zuul
2019-09-30 18:38:05 +00:00
Ryan Petrello
82be87566f improve host key checking configurability
see: https://github.com/ansible/tower/issues/3737
2019-09-30 14:13:07 -04:00
Bill Nottingham
fc70d8b321 Adjust help message; we're no longer using the insights client 2019-09-30 12:17:46 -04:00
Ryan Petrello
846e67ee6a update trial license enforcement logic 2019-09-13 12:14:25 -04:00
Ryan Petrello
7a8234bb09 include license data/state in the sosreport 2019-09-03 13:56:02 -04:00