* trigger via jobs/<id>/job_events/?limit=10
* Can and should be used in conjunction with an indexed set of fields to
generate efficient pagination queries. i.e.
jobs/<id>/job_events?limit=10&start_line__gte=10
* If limit is not specified in the query params then the default
pagination will be used.
* Before, we would get the min and max pk of the set we are to gather.
This changeset removes that.
* Before, we would, basically, know the size of the set we are to gather
and would query 100,000 of those job event records at a time. That logic
is now gone.
* Now, for unpartitioned job events we gather 4 hours at a time by
created time.
* Now, for partitioned job events we gather 4 hours at a time by
modified time.
* The order by results in an in-memory sort that COULD blow out the
worker mem buffer and result in sorting having to take place on disk.
* This WILL happen with a default postgres 4MB mem buffer. We saw as
much as 20MB used. Note that AWX defaults postgres mem worker buffer to
3% of the DB memory on external installs and 1% on same-node installs.
So for a 16GB remote DB this would not be a problem.
* We are going to avoid this problem all together by NOT doing a sort
when gathering. Instead, we will sort remotely, in analytics.
* Old, _unpartitioned_main_jobevent table does not have the job_created
column
* New, main_jobevent does.
* Always in clude the job_created column. NULL if old, job_created if
new
* Bump events_table schema version from 1.2 to 1.3 because of the
job_created field
* We found that having multiple `::json` casts in a query slows down
queries more and more by =~> 33%.
* This change coerces postgres into only casting once. Micro
benchmarking shows =~ 2-3x performance boost
* pre-migration jobevents live in unpartitioned table
where only created field has index
* post-migration jobevents live in partitions
where modified field has index
(and should be used to ensure no events are missing)
* when collecting job events by creation time
it is possible to miss events that were created
at one point, but actually committed to the db
much later.
* since events' modified time is set when they are
committed to the db, we shouldn't miss any job events
* selecting job events by modified time wasn't possible
beforehand because we didn't have an index for
jobevent's modified field
Changes in old unpartitioned cleanup logic:
* Manually cascade delete events related to job(s)
(new partitions cleanup logic) For each event type:
* Get the event partitions that are within the cleanup date range
* Get a list of jobs to delete that are in the cutoff range.
* Jobs that are running, pending, or waiting in the job list are special.
* Use the special list to further filter the partition drop list.
* Drop partitions
* delete jobs
* when tests create a UnifiedJob and JobEvent,
the two need to have the same value for job creation time
* some view validation was skipped due to `model` being
a property in some cases now
we don't need this code at all anymore - the bigint migration is long
gone, and anybody upgrading to this version of AWX has already
migrated their data
keep pre-upgrade events in an old table (instead of a partition)
- instead of creating a default partition, keep all events in special
"unpartitioned" tables
- track these tables via distinct proxy=true models
- when generating the queryset for a UnifiedJob's events, look at the
creation date of the job; if it's before the date of the migration,
query on the old unpartitioned table, otherwise use the more modern table
that provides auto-partitioning
* if we use the actual old job events table
and make tweaks to its schema
namely, dropping the pkey constraint,
then when we go to migrate the old job events
we will be forcing postgres to do a sequential scan
on the old table, which effectively causes the migration to hang
events that existed *prior* to the partition migration will have
`job_created=1970-01-01` auto-applied at migration time; as such,
queries for these events e.g., /api/v2/job/N/job_events/
use 1970-01-01 in related event searche
events created *after* the partition migration (net-new playbook runs
will have `job_created` values that *exactly match* the related
`UnifiedJob.created` field.