High-availability guide restructuring

* Refactor high-availability guide to include both single and multi cluster architectures

Closes #30095
Closes #41585

Signed-off-by: Ryan Emerson <remerson@ibm.com>
Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
Co-authored-by: Alexander Schwartz <aschwart@redhat.com>
Co-authored-by: Alexander Schwartz <alexander.schwartz@gmx.net>
This commit is contained in:
Ryan Emerson 2025-08-06 19:38:37 +01:00 committed by GitHub
parent 84fc9bb3e5
commit 907ee2e4e2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
61 changed files with 1450 additions and 771 deletions

View File

@ -32,3 +32,4 @@
:upgrading_guide_link: {project_doc_base_url}/upgrading/
:kc_js_path: /js
:kc_realms_path: /realms
:kubernetes: Kubernetes

View File

@ -15,17 +15,17 @@ After starting {project_name}, consider adapting your instance to the required l
== Vertical Scaling
As you monitor your {project_name} workload, check to see if the CPU or memory is under or over utilized. Consult <@links.ha id="concepts-memory-and-cpu-sizing" /> to better tune the resources available to the Java Virtual Machine (JVM).
As you monitor your {project_name} workload, check to see if the CPU or memory is under or over utilized. Consult <@links.ha id="single-cluster-concepts-memory-and-cpu-sizing" /> to better tune the resources available to the Java Virtual Machine (JVM).
Before increasing the amount of memory available to the JVM, in particular when experiencing an out of memory error, it is best to determine what is contributing to the increased footprint using a heap dump. Excessive response times may also indicate the HTTP work queue is too large and tuning for load shedding would be better than simply providing more memory. See the following section.
=== Common Tuning Options
{project_name} automatically adjusts the number of used threads based upon how many cores you make available. Manually changing the thread count can improve overall throughput. For more details, see <@links.ha id="concepts-threads" />. However, changing the thread count must be done in conjunction with other JVM resources, such as database connections; otherwise, you may be moving a bottleneck somewhere else. For more details, see <@links.ha id="concepts-database-connections" />.
{project_name} automatically adjusts the number of used threads based upon how many cores you make available. Manually changing the thread count can improve overall throughput. For more details, see <@links.ha id="single-cluster-concepts-threads" />. However, changing the thread count must be done in conjunction with other JVM resources, such as database connections; otherwise, you may be moving a bottleneck somewhere else. For more details, see <@links.ha id="single-cluster-concepts-database-connections" />.
To limit memory utilization of queued work and to provide for load shedding, see <@links.ha id="concepts-threads" anchor="load-shedding" />.
To limit memory utilization of queued work and to provide for load shedding, see <@links.ha id="single-cluster-concepts-threads" anchor="single-cluster-load-shedding" />.
If you are experiencing timeouts in obtaining database connections, you should consider increasing the number of connections available. For more details, see <@links.ha id="concepts-database-connections" />.
If you are experiencing timeouts in obtaining database connections, you should consider increasing the number of connections available. For more details, see <@links.ha id="single-cluster-concepts-database-connections" />.
=== Vertical Autoscaling
@ -37,19 +37,19 @@ A single {project_name} instance is susceptible to availability issues. If the i
A single JVM has a limit on how many concurrent requests it can handle. Additional server instances can provide roughly linear scaling of throughput until associated resources, such as the database or distributed caching, limit that scaling.
In general, consider allowing the {project_name} Operator to handle horizontal scaling concerns. When using the Operator, set the Keycloak custom resource `spec.instances` as desired to horizontally scale. For more details, see <@links.ha id="deploy-keycloak-kubernetes" />.
In general, consider allowing the {project_name} Operator to handle horizontal scaling concerns. When using the Operator, set the Keycloak custom resource `spec.instances` as desired to horizontally scale. For more details, see <@links.ha id="single-cluster-deploy-keycloak" />.
If you are not using the Operator, please review the following:
* Higher availability is possible of your instances are on separate machines. On Kubernetes, use Pod anti-affinitity to enforce this.
* Higher availability is possible if your instances are on separate machines. On Kubernetes, use Pod anti-affinity to enforce this.
* Use distributed caching; for multi-site clusters, use external caching for cluster members to share the same state. For details on the relevant configuration, see <@links.server id="caching" />. The embedded Infinispan cache has horizontal scaling considerations including:
* Use distributed caching; for multi-cluster deployments, use external caching for cluster members to share the same state. For details on the relevant configuration, see <@links.server id="caching" />. The embedded Infinispan cache has horizontal scaling considerations including:
- Your instances need a way to discover each other. For more information, see discovery in <@links.server id="caching" />.
- This cache is not optimal for clusters that span multiple availability zones, which are also called stretch clusters. For embedded Infinispan cache, work to have all instances in one availability zone. The goal is to avoid unnecessary round-trips in the communication that would amplify in the response times. On Kubernetes, use Pod affinity to enforce this grouping of Pods.
- This cache does not gracefully handle multiple members joining or leaving concurrently. In particular, members leaving at the same time can lead to data loss. On Kubernetes, use a StatefulSet with the default serial handling to ensure Pods are started and stopped sequentially, using a deployment is not supported or recommended.
To avoid losing service availability when a whole site is unavailable, see the high availability guide for more information on a multi-site deployment. See <@links.ha id="introduction" />.
To avoid losing service availability when a whole cluster is unavailable, see the high availability guide for more information on a multi-cluster deployments. See <@links.ha id="introduction" />.
=== Horizontal Autoscaling
@ -83,7 +83,7 @@ spec:
averageUtilization: 80
----
NOTE: Scaling on memory is generally not needed with persistent sessions enabled, and should not be needed at all when using remote {jdgserver_name}. If you are using persistent sessions or remote {jdgserver_name} and you experience memory issues, it is best to fully diagnose the problem and revisit the <@links.ha id="concepts-memory-and-cpu-sizing" /> guide. Adjusting the memory request and limit is preferable to horizontal scaling.
NOTE: Scaling on memory is generally not needed with persistent sessions enabled, and should not be needed at all when using remote {jdgserver_name}. If you are using persistent sessions or remote {jdgserver_name} and you experience memory issues, it is best to fully diagnose the problem and revisit the <@links.ha id="single-cluster-concepts-memory-and-cpu-sizing" /> guide. Adjusting the memory request and limit is preferable to horizontal scaling.
Consult the https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/[Kubernetes docs] for additional information, including the usage of https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics[custom metrics].

View File

@ -1,21 +0,0 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for database connection pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand considerations and best practices on how to configure database connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="deploy-keycloak-kubernetes" />.
== Concepts
Creating new database connections is expensive as it takes time.
Creating them when a request arrives will delay the response, so it is good to have them created before the request arrives.
It can also contribute to a https://en.wikipedia.org/wiki/Cache_stampede[stampede effect] where creating a lot of connections in a short time makes things worse as it slows down the system and blocks threads.
Closing a connection also invalidates all server side statements caching for that connection.
include::partials/database-connections/configure-db-connection-pool-best-practices.adoc[]
</@tmpl.guide>

View File

@ -1,169 +0,0 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for sizing CPU and memory resources"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
Use this as a starting point to size a product environment.
Adjust the values for your environment as needed based on your load tests.
== Performance recommendations
[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when {project_name} instances running for a longer time.
This will decrease response times and reduce IOPS on the database.
Still, those caches need to be filled when an instance is restarted, so do not set resources too tight based on the stable state measured once the caches have been filled.
* Use these values as a starting point and perform your own load tests before going into production.
====
Summary:
* The used CPU scales linearly with the number of requests up to the tested limit below.
Recommendations:
* The base memory usage for a Pod including caches of Realm data and 10,000 cached sessions is 1250 MB of RAM.
* In containers, Keycloak allocates 70% of the memory limit for heap-based memory. It will also use approximately 300 MB of non-heap-based memory.
To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.
* For each 15 password-based user logins per second, allocate 1 vCPU to the cluster (tested with up to 300 per second).
+
{project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.
* For each 120 client credential grants per second, 1 vCPU to the cluster (tested with up to 2000 per second).^*^
+
Most CPU time goes into creating new TLS connections, as each client runs only a single request.
* For each 120 refresh token requests per second, 1 vCPU to the cluster (tested with up to 435 refresh token requests per second).^*^
* Leave 150% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and enough capacity to handle failover tasks.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.
* When performing requests with more than 2500 different clients concurrently, not all client information will fit into {project_name}'s caches when those are using the standard cache sizes of 10000 entries each.
Due to this, the database may become a bottleneck as client data is reloaded frequently from the database.
To reduce the database usage, increase the `users` cache size by two times the number of concurrently used clients, and the `realms` cache size by four times the number of concurrently used clients.
{project_name}, which by default stores user sessions in the database, requires the following resources for optimal performance on an Aurora PostgreSQL multi-AZ database:
For every 100 login/logout/refresh requests per second:
- Budget for 1400 Write IOPS.
- Allocate between 0.35 and 0.7 vCPU.
The vCPU requirement is given as a range, as with an increased CPU saturation on the database host the CPU usage per request decreases while the response times increase. A lower CPU quota on the database can lead to slower response times during peak loads. Choose a larger CPU quota if fast response times during peak loads are critical. See below for an example.
=== Measuring the activity of a running {project_name} instance
Sizing of a {project_name} instance depends on the actual and forecasted numbers for password-based user logins, refresh token requests, and client credential grants as described in the previous section.
To retrieve the actual numbers of a running {project_name} instance for these three key inputs, use the metrics {project_name} provides:
* The user event metric `keycloak_user_events_total` for event type `login` includes both password-based logins and cookie-based logins, still it can serve as a first approximate input for this sizing guide.
* To find out number of password validations performed by {project_name} use the metric `keycloak_credentials_password_hashing_validations_total`.
The metric also contains tags providing some details about the hashing algorithm used and the outcome of the validation.
Here is the list of available tags: `realm`, `algorithm`, `hashing_strength`, `outcome`.
* Use the user event metric `keycloak_user_events_total` for the event types `refresh_token` and `client_login` for refresh token requests and client credential grants respectively.
See the <@links.observability id="event-metrics"/> and <@links.observability id="metrics-for-troubleshooting-http"/> {sections} for more information.
These metrics are crucial for tracking daily and weekly fluctuations in user activity loads,
identifying emerging trends that may indicate the need to resize the system and
validating sizing calculations.
By systematically measuring and evaluating these user event metrics,
you can ensure your system remains appropriately scaled and responsive to changes in user behavior and demand.
=== Calculation example (single site)
Target size:
* 45 logins and logouts per seconds
* 360 client credential grants per second^*^
* 360 refresh token requests per second (1:8 ratio for logins)^*^
* 3 Pods
Limits calculated:
* CPU requested per Pod: 3 vCPU
+
(45 logins per second = 3 vCPU, 360 client credential grants per second = 3 vCPU, 360 refresh tokens = 3 vCPU. This sums up to 9 vCPU total. With 3 Pods running in the cluster, each Pod then requests 3 vCPU)
* CPU limit per Pod: 7.5 vCPU
+
(Allow for an additional 150% CPU requested to handle peaks, startups and failover tasks)
* Memory requested per Pod: 1250 MB
+
(1250 MB base memory)
* Memory limit per Pod: 1360 MB
+
(1250 MB expected memory usage minus 300 non-heap-usage, divided by 0.7)
* Aurora Database instance: either `db.t4g.large` or `db.t4g.xlarge` depending on the required response times during peak loads.
+
(45 logins per second, 5 logouts per second, 360 refresh tokens per seconds.
This sums up to 410 requests per second.
This expected DB usage is 1.4 to 2.8 vCPU, with a DB idle load of 0.3 vCPU.
This indicates either a 2 vCPU `db.t4g.large` instance or a 4 vCPU `db.t4g.xlarge` instance.
A 2 vCPU `db.t4g.large` would be more cost-effective if the response times are allowed to be higher during peak usage.
In our tests, the median response time for a login and a token refresh increased by up to 120 ms once the CPU saturation reached 90% on a 2 vCPU `db.t4g.large` instance given this scenario.
For faster response times during peak usage, consider a 4 vCPU `db.t4g.xlarge` instance for this scenario.)
////
<#noparse>
./benchmark.sh eu-west-1 --scenario=keycloak.scenario.authentication.AuthorizationCode --server-url=${KEYCLOAK_URL} --realm-name=realm-0 --users-per-sec=45 --ramp-up=10 --refresh-token-period=2 --refresh-token-count=8 --logout-percentage=10 --measurement=600 --users-per-realm=20000 --log-http-on-failure
</#noparse>
////
=== Sizing a multi-site setup
To create the sizing an active-active Keycloak setup with two AZs in one AWS region, following these steps:
* Create the same number of Pods with the same memory sizing as above on the second site.
* The database sizing remains unchanged. Both sites will connect to the same database writer instance.
In regard to the sizing of CPU requests and limits, there are different approaches depending on the expected failover behavior:
Fast failover and more expensive::
Keep the CPU requests and limits as above for the second site. This way any remaining site can take over the traffic from the primary site immediately without the need to scale.
Slower failover and more cost-effective::
Reduce the CPU requests and limits as above by 50% for the second site. When one of the sites fails, scale the remaining site from 3 Pod to 6 Pods either manually, automated, or using a Horizontal Pod Autoscaler. This requires enough spare capacity on the cluster or cluster auto-scaling capabilities.
Alternative setup for some environments::
Reduce the CPU requests by 50% for the second site, but keep the CPU limits as above. This way, the remaining site can take the traffic, but only at the downside that the Nodes will experience CPU pressure and therefore slower response times during peak traffic.
The benefit of this setup is that the number of Pods does not need to scale during failovers which is simpler to set up.
== Reference architecture
The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios:
* OpenShift 4.17.x deployed on AWS via ROSA.
* Machine pool with `c7g.2xlarge` instances.^*^
* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/active mode.
* OpenShift's reverse proxy runs in the passthrough mode where the TLS connection of the client is terminated at the Pod.
* Database Amazon Aurora PostgreSQL in a multi-AZ setup.
* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP] (which is the default).
* Client credential grants do not use refresh tokens (which is the default).
* Database seeded with 20,000 users and 20,000 clients.
* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All authentication sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
* All user and client sessions are stored in the database and are not cached in-memory as this was tested in a multi-site setup.
Expect a slightly higher performance for single-site setups as a fixed number of user and client sessions will be cached.
* OpenJDK 21
^*^ For non-ARM CPU architectures on AWS (`c7i`/`c7a` vs. `c7g`) we found that client credential grants and refresh token workloads were able to deliver up to two times the number of operations per CPU core, while password hashing was delivering a constant number of operations per CPU core. Depending on your workload and your cloud pricing, please run your own tests and make your own calculations for mixed workloads to find out which architecture delivers a better pricing for you.
</@tmpl.guide>

View File

@ -1,66 +0,0 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying AWS Aurora in multiple availability zones"
summary="Deploy an AWS Aurora as the database building block in a multi-site deployment."
tileVisible="false" >
This topic describes how to deploy an Aurora regional deployment of a PostgreSQL instance across multiple availability zones to tolerate one or more availability zone failures in a given AWS region.
This deployment is intended to be used with the setup described in the <@links.ha id="concepts-multi-site"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="bblocks-multi-site"/> {section}.
include::partials/blueprint-disclaimer.adoc[]
== Architecture
Aurora database clusters consist of multiple Aurora database instances, with one instance designated as the primary writer and all others as backup readers.
To ensure high availability in the event of availability zone failures, Aurora allows database instances to be deployed across multiple zones in a single AWS region.
In the event of a failure on the availability zone that is hosting the Primary database instance, Aurora automatically heals itself and promotes a reader instance from a non-failed availability zone to be the new writer instance.
.Aurora Multiple Availability Zone Deployment
image::high-availability/aurora-multi-az.dio.svg[]
See the https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html[AWS Aurora documentation] for more details on the semantics provided by Aurora databases.
This documentation follows AWS best practices and creates a private Aurora database that is not exposed to the Internet.
To access the database from a ROSA cluster, <<establish-peering-connections-with-rosa-clusters,establish a peering connection between the database and the ROSA cluster>>.
== Procedure
The following procedure contains two sections:
* Creation of an Aurora Multi-AZ database cluster with the name "keycloak-aurora" in eu-west-1.
* Creation of a peering connection between the ROSA cluster(s) and the Aurora VPC to allow applications deployed on the ROSA clusters to establish connections with the database.
=== Create Aurora database Cluster
include::partials/aurora/aurora-multiaz-create-procedure.adoc[]
[#establish-peering-connections-with-rosa-clusters]
=== Establish Peering Connections with ROSA clusters
Perform these steps once for each ROSA cluster that contains a {project_name} deployment.
include::partials/aurora/aurora-create-peering-connections.adoc[]
== Verifying the connection
include::partials/aurora/aurora-verify-peering-connections.adoc[]
[#connecting-aurora-to-keycloak]
== Connecting Aurora database with {project_name}
Now that an Aurora database has been established and linked with all of your ROSA clusters, here are the relevant {project_name} CR options to connect the Aurora database with {project_name}. These changes will be required in the <@links.ha id="deploy-keycloak-kubernetes" /> {section}. The JDBC url is configured to use the Aurora database writer endpoint.
. Update `spec.db.url` to be `jdbc:aws-wrapper:postgresql://$HOST:5432/keycloak` where `$HOST` is the
<<aurora-writer-url, Aurora writer endpoint URL>>.
. Ensure that the Secrets referenced by `spec.db.usernameSecret` and `spec.db.passwordSecret` contain usernames and passwords defined when creating Aurora.
</@tmpl.guide>
== Next steps
After successful deployment of the Aurora database continue with <@links.ha id="deploy-infinispan-kubernetes-crossdc" />

View File

@ -1,95 +0,0 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying {project_name} for HA with the Operator"
summary="Deploy {project_name} for high availability with the {project_name} Operator as a building block."
tileVisible="false" >
This guide describes advanced {project_name} configurations for Kubernetes which are load tested and will recover from single Pod failures.
These instructions are intended for use with the setup described in the <@links.ha id="concepts-multi-site"/> {section}.
Use it together with the other building blocks outlined in the <@links.ha id="bblocks-multi-site"/> {section}.
== Prerequisites
* OpenShift or Kubernetes cluster running.
* Understanding of a <@links.operator id="basic-deployment" /> of {project_name} with the {project_name} Operator.
* Aurora AWS database deployed using the <@links.ha id="deploy-aurora-multi-az" /> {section}.
* {jdgserver_name} server deployed using the <@links.ha id="deploy-infinispan-kubernetes-crossdc" /> {section}.
* Running {project_name} with OpenJDK 21, which is the default for the containers distributed for {project_name}, as this enabled virtual threads for the JGroups communication.
== Procedure
. Determine the sizing of the deployment using the <@links.ha id="concepts-memory-and-cpu-sizing" /> {section}.
. Install the {project_name} Operator as described in the <@links.operator id="installation" /> {section}.
. Notice the configuration file below contains options relevant for connecting to the Aurora database from <@links.ha id="deploy-aurora-multi-az" anchor="connecting-aurora-to-keycloak" />
. Notice the configuration file below options relevant for connecting to the {jdgserver_name} server from <@links.ha id="deploy-infinispan-kubernetes-crossdc" anchor="connecting-infinispan-to-keycloak" />
. Build a custom {project_name} image which is link:{links_server_db_url}#preparing-keycloak-for-amazon-aurora-postgresql[prepared for usage with the Amazon Aurora PostgreSQL database].
. Deploy the {project_name} CR with the following values with the resource requests and limits calculated in the first step:
+
[source,yaml]
----
include::examples/generated/keycloak.yaml[tag=keycloak]
----
<1> The database connection pool initial, max and min size should be identical to allow statement caching for the database.
Adjust this number to meet the needs of your system.
As most requests will not touch the database due to the {project_name} embedded cache, this change can server several hundreds of requests per second.
See the <@links.ha id="concepts-database-connections" /> {section} for details.
<2> Specify the URL to your custom {project_name} image. If your image is optimized, set the `startOptimized` flag to `true`.
<3> Enable additional features for multi-site support like the loadbalancer probe `/lb-check`.
<4> To be able to analyze the system under load, enable the metrics endpoint.
== Verifying the deployment
Confirm that the {project_name} deployment is ready.
[source,bash]
----
kubectl wait --for=condition=Ready keycloaks.k8s.keycloak.org/keycloak
kubectl wait --for=condition=RollingUpdate=False keycloaks.k8s.keycloak.org/keycloak
----
== Optional: Load shedding
To enable load shedding, limit the number of queued requests.
.Load shedding with max queued http requests
[source,yaml,indent=0]
----
spec:
additionalOptions:
include::examples/generated/keycloak.yaml[tag=keycloak-queue-size]
----
All exceeding requests are served with an HTTP 503.
You might consider limiting the value for `http-pool-max-threads` further because multiple concurrent threads will lead to throttling by Kubernetes once the requested CPU limit is reached.
See the <@links.ha id="concepts-threads" /> {section} about load shedding for details.
== Optional: Disable sticky sessions
When running on OpenShift and the default passthrough Ingress setup as provided by the {project_name} Operator, the load balancing done by HAProxy is done by using sticky sessions based on the IP address of the source.
When running load tests, or when having a reverse proxy in front of HAProxy, you might want to disable this setup to avoid receiving all requests on a single {project_name} Pod.
Add the following supplementary configuration under the `spec` in the {project_name} Custom Resource to disable sticky sessions.
[source,yaml,subs="attributes+"]
----
spec:
ingress:
enabled: true
annotations:
# When running load tests, disable sticky sessions on the OpenShift HAProxy router
# to avoid receiving all requests on a single {project_name} Pod.
haproxy.router.openshift.io/balance: roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
----
</@tmpl.guide>

View File

@ -3,139 +3,57 @@
<#import "/templates/profile.adoc" as profile>
<@tmpl.guide
title="Multi-site deployments"
summary="Connect multiple {project_name} deployments in different sites to increase the overall availability." >
title="High availability overview"
summary="Explore the different {project_name} high-availability architectures" >
{project_name} supports deployments that consist of multiple {project_name} instances that connect to each other using its Infinispan caches; load balancers can distribute the load evenly across those instances.
Those setups are intended for a transparent network on a single site.
{project_name} supports different high-availability architectures, allowing system administrators to pick the deployment type most suitable
for their needs. Ease of deployment, cost and fault-tolerance guarantees are important considerations when determining the correct architecture
for your deployments.
The {project_name} high-availability guide goes one step further to describe setups across multiple sites.
While this setup adds additional complexity, that extra amount of high availability may be needed for some environments.
== Architectures
== When to use a multi-site setup
The following architectures are supported by {project_name}.
The multi-site deployment capabilities of {project_name} are targeted at use cases that:
=== Single cluster
* Are constrained to a single
<@profile.ifProduct>
AWS Region.
</@profile.ifProduct>
<@profile.ifCommunity>
AWS Region or an equivalent low-latency setup.
</@profile.ifCommunity>
* Permit planned outages for maintenance.
* Fit within a defined user and request count.
* Can accept the impact of periodic outages.
Deploy {project_name} in a single cluster, optionally "stretched" across multiple availability-zones, using <@links.ha id="single-cluster-introduction" />.
Advantages::
* No external dependencies
* Deployment in a single {kubernetes} cluster or a set of virtual machines with transparent networking
* Tolerate availability-zone failures if deployed to multiple availability zones
Disadvantages::
* {kubernetes} cluster is a single point of failure:
** Control-plane failures could impact all {project_name} pods
=== Multi cluster
Connect two {project_name} clusters deployed for example in different {kubernetes} clusters in two availability zones to increase availability using <@links.ha id="multi-cluster-introduction" />.
Advantages::
* Tolerate availability-zone failure
* Tolerate {kubernetes} cluster failure
* Bridge two networks that do not offer transparent networking
* Regulatory compliance when distinct deployments are required
Disadvantages::
* Complexity:
** External load-balancer required
** Separate {jdgserver_name} cluster required on each site
* Cost:
** Additional load-balancer required
** Additional compute is required for external {jdgserver_name} clusters
** Two {kubernetes} control-planes must be provisioned
* Not supported with three or more availability zones
=== Next Steps
To learn more about the different high-availability architectures, please consult the individual guides.
<@profile.ifCommunity>
== Tested Configuration
We regularly test {project_name} with the following configuration:
</@profile.ifCommunity>
<@profile.ifProduct>
== Supported Configuration
</@profile.ifProduct>
* Two Openshift single-AZ clusters, in the same AWS Region
** Provisioned with https://www.redhat.com/en/technologies/cloud-computing/openshift/aws[Red Hat OpenShift Service on AWS] (ROSA),
<@profile.ifProduct>
either ROSA HCP or ROSA classic.
</@profile.ifProduct>
<@profile.ifCommunity>
using ROSA HCP.
</@profile.ifCommunity>
** Each Openshift cluster has all its workers in a single Availability Zone.
** OpenShift version
<@profile.ifProduct>
4.17 (or later).
</@profile.ifProduct>
<@profile.ifCommunity>
4.17.
</@profile.ifCommunity>
* Amazon Aurora PostgreSQL database
** High availability with a primary DB instance in one Availability Zone, and a synchronously replicated reader in the second Availability Zone
** Version ${properties["aurora-postgresql.version"]}
* AWS Global Accelerator, sending traffic to both ROSA clusters
* AWS Lambda
<@profile.ifCommunity>
triggered by ROSA's Prometheus and Alert Manager
</@profile.ifCommunity>
to automate failover
<@profile.ifProduct>
Any deviation from the configuration above is not supported and any issue must be replicated in that environment for support.
</@profile.ifProduct>
<@profile.ifCommunity>
While equivalent setups should work, you will need to verify the performance and failure behavior of your environment.
We provide functional tests, failure tests and load tests in the https://github.com/keycloak/keycloak-benchmark[Keycloak Benchmark Project].
</@profile.ifCommunity>
Read more on each item in the <@links.ha id="bblocks-multi-site" /> {section}.
<@profile.ifProduct>
== Maximum load
</@profile.ifProduct>
<@profile.ifCommunity>
== Tested load
We regularly test {project_name} with the following load:
</@profile.ifCommunity>
* 100,000 users
* 300 requests per second
<@profile.ifCommunity>
While we did not see a hard limit in our tests with these values, we ask you to test for higher volumes with horizontally and vertically scaled {project_name} name instances and databases.
</@profile.ifCommunity>
See the <@links.ha id="concepts-memory-and-cpu-sizing" /> {section} for more information.
== Limitations
<@profile.ifCommunity>
Even with the additional redundancy of the two sites, downtimes can still occur:
</@profile.ifCommunity>
* During upgrades of {project_name} or {jdgserver_name} both sites needs to be taken offline for the duration of the upgrade.
* During certain failure scenarios, there may be downtime of up to 5 minutes.
* After certain failure scenarios, manual intervention may be required to restore redundancy by bringing the failed site back online.
* During certain switchover scenarios, there may be downtime of up to 5 minutes.
For more details on limitations see the <@links.ha id="concepts-multi-site" /> {section}.
== Next steps
The different {sections} introduce the necessary concepts and building blocks.
For each building block, a blueprint shows how to set a fully functional example.
Additional performance tuning and security hardening are still recommended when preparing a production setup.
<@profile.ifCommunity>
== Concept and building block overview
* <@links.ha id="concepts-multi-site" />
* <@links.ha id="bblocks-multi-site" />
* <@links.ha id="concepts-database-connections" />
* <@links.ha id="concepts-threads" />
* <@links.ha id="concepts-memory-and-cpu-sizing" />
* <@links.ha id="concepts-infinispan-cli-batch" />
== Blueprints for building blocks
* <@links.ha id="deploy-aurora-multi-az" />
* <@links.ha id="deploy-infinispan-kubernetes-crossdc" />
* <@links.ha id="deploy-keycloak-kubernetes" />
* <@links.ha id="deploy-aws-accelerator-loadbalancer" />
* <@links.ha id="deploy-aws-accelerator-fencing-lambda" />
== Operational procedures
* <@links.ha id="operate-synchronize" />
* <@links.ha id="operate-site-offline" />
* <@links.ha id="operate-site-online" />
* <@links.ha id="health-checks-multi-site" />
* <@links.ha id="single-cluster-introduction" />
* <@links.ha id="multi-cluster-introduction" />
</@profile.ifCommunity>
</@tmpl.guide>

View File

@ -2,19 +2,20 @@
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Building blocks multi-site deployments"
summary="Learn about building blocks and suggested setups for multi-site deployments." >
title="Building blocks multi-cluster deployments"
summary="Learn about building blocks and suggested setups for multi-cluster deployments."
tileVisible="false" >
The following building blocks are needed to set up a multi-site deployment with synchronous replication.
The following building blocks are needed to set up a multi-cluster deployment with synchronous replication.
The building blocks link to a blueprint with an example configuration.
They are listed in the order in which they need to be installed.
include::partials/blueprint-disclaimer.adoc[]
include::../partials/blueprint-disclaimer.adoc[]
== Prerequisites
* Understanding the concepts laid out in the <@links.ha id="concepts-multi-site"/> {section}.
* Understanding the concepts laid out in the <@links.ha id="multi-cluster-concepts"/> {section}.
== Two sites with low-latency connection
@ -37,22 +38,22 @@ Ensures that the instances are deployed and restarted as needed.
A synchronously replicated database across two sites.
*Blueprint:* <@links.ha id="deploy-aurora-multi-az"/>.
*Blueprint:* <@links.ha id="multi-cluster-deploy-aurora"/>.
== {jdgserver_name}
A deployment of {jdgserver_name} that leverages the {jdgserver_name}'s Cross-DC functionality.
*Blueprint:* <@links.ha id="deploy-infinispan-kubernetes-crossdc" /> using the {jdgserver_name} Operator, and connect the two sites using {jdgserver_name}'s Gossip Router.
*Blueprint:* <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" /> using the {jdgserver_name} Operator, and connect the two sites using {jdgserver_name}'s Gossip Router.
*Not considered:* Direct interconnections between the Kubernetes clusters on the network layer.
*Not considered:* Direct interconnections between the {kubernetes} clusters on the network layer.
It might be considered in the future.
== {project_name}
A clustered deployment of {project_name} in each site, connected to an external {jdgserver_name}.
*Blueprint:* <@links.ha id="deploy-keycloak-kubernetes" /> that includes connecting to the Aurora database and the {jdgserver_name} server.
*Blueprint:* <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" /> that includes connecting to the Aurora database and the {jdgserver_name} server.
</@tmpl.guide>
@ -60,4 +61,4 @@ A clustered deployment of {project_name} in each site, connected to an external
A load balancer which checks the `/lb-check` URL of the {project_name} deployment in each site, plus an automation to detect {jdgserver_name} connectivity problems between the two sites.
*Blueprint:* <@links.ha id="deploy-aws-accelerator-loadbalancer"/> together with <@links.ha id="deploy-aws-accelerator-fencing-lambda"/>.
*Blueprint:* <@links.ha id="multi-cluster-deploy-aws-accelerator-loadbalancer"/> together with <@links.ha id="multi-cluster-deploy-aws-accelerator-fencing-lambda"/>.

View File

@ -0,0 +1,17 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for database connection pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand considerations and best practices on how to configure database connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" />.
[#multi-cluster-db-concepts]
== Concepts
include::../partials/database-connections/configure-db-connection-pool-best-practices.adoc[]
</@tmpl.guide>

View File

@ -6,20 +6,20 @@ title="Concepts to automate {jdgserver_name} CLI commands"
summary="{jdgserver_name} CLI commands can be automated by creating a `Batch` CR instance."
tileVisible="false" >
include::partials/infinispan/infinispan-attributes.adoc[]
include::../partials/infinispan/infinispan-attributes.adoc[]
When interacting with an external {jdgserver_name} in Kubernetes, the `Batch` CR allows you to automate this using standard `kubectl` commands.
When interacting with an external {jdgserver_name} in {kubernetes}, the `Batch` CR allows you to automate this using standard `kubectl` commands.
== When to use it
Use this when automating interactions on Kubernetes.
Use this when automating interactions on {kubernetes}.
This avoids providing usernames and passwords and checking shell script outputs and their status.
For human interactions, the CLI shell might still be a better fit.
== Example
The following `Batch` CR takes a site offline as described in the operational procedure <@links.ha id="operate-site-offline" />.
The following `Batch` CR takes a site offline as described in the operational procedure <@links.ha id="multi-cluster-operate-site-offline" />.
[source,yaml,subs="+attributes"]
----

View File

@ -0,0 +1,67 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for sizing CPU and memory resources"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
Use this as a starting point to size a product environment.
Adjust the values for your environment as needed based on your load tests.
[#multi-cluster-performance-recommendations]
== Performance recommendations
include::../partials/concepts/perf_recommendations.adoc[]
[#multi-cluster-measture-running-instance]
=== Measuring the activity of a running {project_name} instance
<#include "/high-availability/partials/concepts/perf_measuring_running_instance.adoc" />
[#multi-cluster-single-site-calculation]
=== Calculation example (single site)
include::../partials/concepts/perf_single_site_calculation.adoc[]
=== Sizing a multi cluster setup
To create the sizing an active-active Keycloak setup with two AZs in one AWS region, following these steps:
* Create the same number of Pods with the same memory sizing as above on the second site.
* The database sizing remains unchanged. Both sites will connect to the same database writer instance.
In regard to the sizing of CPU requests and limits, there are different approaches depending on the expected failover behavior:
Fast failover and more expensive::
Keep the CPU requests and limits as above for the second site. This way any remaining site can take over the traffic from the primary site immediately without the need to scale.
Slower failover and more cost-effective::
Reduce the CPU requests and limits as above by 50% for the second site. When one of the sites fails, scale the remaining site from 3 Pod to 6 Pods either manually, automated, or using a Horizontal Pod Autoscaler. This requires enough spare capacity on the cluster or cluster auto-scaling capabilities.
Alternative setup for some environments::
Reduce the CPU requests by 50% for the second site, but keep the CPU limits as above. This way, the remaining site can take the traffic, but only at the downside that the Nodes will experience CPU pressure and therefore slower response times during peak traffic.
The benefit of this setup is that the number of Pods does not need to scale during failovers which is simpler to set up.
== Reference architecture
The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios:
* OpenShift 4.17.x deployed on AWS via ROSA.
* Machine pool with `c7g.2xlarge` instances.^*^
* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/active mode.
* OpenShift's reverse proxy runs in the passthrough mode where the TLS connection of the client is terminated at the Pod.
* Database Amazon Aurora PostgreSQL in a multi-AZ setup.
* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP] (which is the default).
* Client credential grants do not use refresh tokens (which is the default).
* Database seeded with 20,000 users and 20,000 clients.
* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All authentication sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
* All user and client sessions are stored in the database and are not cached in-memory as this was tested in a multi-cluster setup.
Expect a slightly higher performance for single-site setups as a fixed number of user and client sessions will be cached.
* OpenJDK 21
^*^ For non-ARM CPU architectures on AWS (`c7i`/`c7a` vs. `c7g`) we found that client credential grants and refresh token workloads were able to deliver up to two times the number of operations per CPU core, while password hashing was delivering a constant number of operations per CPU core. Depending on your workload and your cloud pricing, please run your own tests and make your own calculations for mixed workloads to find out which architecture delivers a better pricing for you.
</@tmpl.guide>

View File

@ -0,0 +1,17 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for configuring thread pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand the considerations and best practices on how to configure thread pools connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" />.
[#multi-cluster-threads-concept]
== Concepts
<#include "/high-availability/partials/concepts/threads.adoc" />
</@tmpl.guide>

View File

@ -2,14 +2,15 @@
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for multi-site deployments"
summary="Understand multi-site deployment with synchronous replication." >
title="Concepts for multi-cluster deployments"
summary="Understand multi-cluster deployment with synchronous replication."
tileVisible="false" >
This topic describes a highly available multi-site setup and the behavior to expect. It outlines the requirements of the high availability architecture and describes the benefits and tradeoffs.
This topic describes a highly available multi-cluster setup and the behavior to expect. It outlines the requirements of the high availability architecture and describes the benefits and tradeoffs.
== When to use this setup
Use this setup to provide {project_name} deployments that are able to tolerate site failures, reducing the likelihood of downtime.
Use this setup to provide {project_name} deployments that are able to tolerate {kubernetes} cluster failures, reducing the likelihood of downtime.
== Deployment, data storage and caching
@ -144,6 +145,6 @@ Therefore, tradeoffs exist between high availability and consistency. The focus
== Next steps
Continue reading in the <@links.ha id="bblocks-multi-site" /> {section} to find blueprints for the different building blocks.
Continue reading in the <@links.ha id="multi-cluster-building-blocks" /> {section} to find blueprints for the different building blocks.
</@tmpl.guide>

View File

@ -0,0 +1,56 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying AWS Aurora in multiple availability zones"
summary="Deploy an AWS Aurora as the database building block in a multi-cluster deployment."
tileVisible="false" >
This topic describes how to deploy an Aurora regional deployment of a PostgreSQL instance across multiple availability zones to tolerate one or more availability zone failures in a given AWS region.
This deployment is intended to be used with the setup described in the <@links.ha id="multi-cluster-concepts"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="multi-cluster-building-blocks"/> {section}.
include::../partials/blueprint-disclaimer.adoc[]
<#include "/high-availability/partials/aurora/aurora-architecture.adoc" />
[#multi-cluster-aurora-procedure]
== Procedure
The following procedure contains two sections:
* Creation of an Aurora Multi-AZ database cluster with the name "keycloak-aurora" in eu-west-1.
* Creation of a peering connection between the ROSA cluster(s) and the Aurora VPC to allow applications deployed on the ROSA clusters to establish connections with the database.
[#multi-cluster-aurora-create]
=== Create Aurora database Cluster
include::../partials/aurora/aurora-multiaz-create-procedure.adoc[]
[#multi-cluster-establish-peering-connection-with-rosa-cluster]
=== Establish Peering Connections with ROSA clusters
Perform these steps once for each ROSA cluster that contains a {project_name} deployment.
include::../partials/aurora/aurora-create-peering-connections.adoc[]
== Verifying the connection
include::../partials/aurora/aurora-verify-peering-connections.adoc[]
[#multi-cluster-aurora-connecting]
== Connecting Aurora database with {project_name}
Now that an Aurora database has been established and linked with all of your ROSA clusters, here are the relevant {project_name} CR options to connect the Aurora database with {project_name}. These changes will be required in the <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" /> {section}. The JDBC url is configured to use the Aurora database writer endpoint.
. Update `spec.db.url` to be `jdbc:aws-wrapper:postgresql://$HOST:5432/keycloak` where `$HOST` is the
<<aurora-writer-url, Aurora writer endpoint URL>>.
. Ensure that the Secrets referenced by `spec.db.usernameSecret` and `spec.db.passwordSecret` contain usernames and passwords defined when creating Aurora.
</@tmpl.guide>
== Next steps
After successful deployment of the Aurora database continue with <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" />

View File

@ -3,31 +3,31 @@
<@tmpl.guide
title="Deploying an AWS Lambda to disable a non-responding site"
summary="Deploy an AWS Lambda as part of the load-balancer building block in a multi-site deployment."
summary="Deploy an AWS Lambda as part of the load-balancer building block in a multi-cluster deployment."
tileVisible="false" >
This {section} explains how to resolve split-brain scenarios between two sites in a multi-site deployment.
This {section} explains how to resolve split-brain scenarios between two sites in a multi-cluster deployment.
It also disables replication if one site fails, so the other site can continue to serve requests.
This deployment is intended to be used with the setup described in the <@links.ha id="concepts-multi-site"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="bblocks-multi-site"/> {section}.
This deployment is intended to be used with the setup described in the <@links.ha id="multi-cluster-concepts"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="multi-cluster-building-blocks"/> {section}.
include::partials/blueprint-disclaimer.adoc[]
include::../partials/blueprint-disclaimer.adoc[]
== Architecture
In the event of a network communication failure between sites in a multi-site deployment, it is no longer possible for the two sites to continue to replicate the data between them.
In the event of a network communication failure between sites in a multi-cluster deployment, it is no longer possible for the two sites to continue to replicate the data between them.
The {jdgserver_name} is configured with a `FAIL` failure policy, which ensures consistency over availability. Consequently, all user requests are served with an error message until the failure is resolved, either by restoring the network connection or by disabling cross-site replication.
In such scenarios, a quorum is commonly used to determine which sites are marked as online or offline.
However, as multi-site deployments only consist of two sites, this is not possible.
However, as multi-cluster deployments only consist of two sites, this is not possible.
Instead, we leverage "`fencing`" to ensure that when one of the sites is unable to connect to the other site, only one site remains in the load balancer configuration, and hence only this site is able to serve subsequent users requests.
In addition to the load balancer configuration, the fencing procedure disables replication between the two {jdgserver_name} clusters to allow serving user requests from the site that remains in the load balancer configuration.
As a result, the sites will be out-of-sync once the replication has been disabled.
To recover from the out-of-sync state, a manual re-sync is necessary as described in <@links.ha id="operate-synchronize" />.
This is why a site which is removed via fencing will not be re-added automatically when the network communication failure is resolved. The remove site should only be re-added once the two sites have been synchronized using the outlined procedure <@links.ha id="operate-site-online" />.
To recover from the out-of-sync state, a manual re-sync is necessary as described in <@links.ha id="multi-cluster-operate-synchronize" />.
This is why a site which is removed via fencing will not be re-added automatically when the network communication failure is resolved. The remove site should only be re-added once the two sites have been synchronized using the outlined procedure <@links.ha id="multi-cluster-operate-site-online" />.
In this {section} we describe how to implement fencing using a combination of https://prometheus.io/docs/alerting/latest/overview/[Prometheus Alerts]
and AWS Lambda functions.
@ -40,13 +40,13 @@ The logic in the AWS Lambda ensures that always one site entry remains in the lo
== Prerequisites
* ROSA HCP based multi-site Keycloak deployment
* ROSA HCP based multi-cluster Keycloak deployment
* AWS CLI Installed
* AWS Global Accelerator load balancer
* `jq` tool installed
== Procedure
. Enable Openshift user alert routing
. Enable OpenShift user alert routing
+
.Command:
[source,bash]
@ -109,7 +109,7 @@ ROLE_ARN=$(aws iam create-role \
</#noparse>
----
<1> A name of your choice to associate with the Lambda and related resources
<2> The AWS Region hosting your Kubernetes clusters
<2> The AWS Region hosting your {kubernetes} clusters
+
. Create and attach the 'LambdaSecretManager' Policy so that the Lambda can access AWS Secrets
+
@ -174,7 +174,7 @@ aws iam attach-role-policy \
LAMBDA_ZIP=/tmp/lambda.zip
cat << EOF > /tmp/lambda.py
include::examples/generated/fencing_lambda.py[]
include::../examples/generated/fencing_lambda.py[]
EOF
zip -FS --junk-paths ${LAMBDA_ZIP} /tmp/lambda.py
@ -196,7 +196,7 @@ aws lambda create-function \
--region eu-west-1 #<1>
</#noparse>
----
<1> The AWS Region hosting your Kubernetes clusters
<1> The AWS Region hosting your {kubernetes} clusters
+
. Expose a Function URL so the Lambda can be triggered as webhook
+
@ -210,7 +210,7 @@ aws lambda create-function-url-config \
--region eu-west-1 #<1>
</#noparse>
----
<1> The AWS Region hosting your Kubernetes clusters
<1> The AWS Region hosting your {kubernetes} clusters
+
. Allow public invocations of the Function URL
+
@ -227,11 +227,11 @@ aws lambda add-permission \
--region eu-west-1 # <1>
</#noparse>
----
<1> The AWS Region hosting your Kubernetes clusters
<1> The AWS Region hosting your {kubernetes} clusters
+
. Configure the Lambda's Environment variables:
+
.. In each Kubernetes cluster, retrieve the exposed {jdgserver_name} URL endpoint:
.. In each {kubernetes} cluster, retrieve the exposed {jdgserver_name} URL endpoint:
+
[source,bash]
----
@ -275,8 +275,8 @@ aws lambda update-function-configuration \
----
+
<1> The name of the AWS Global Accelerator used by your deployment
<2> The AWS Region hosting your Kubernetes cluster and Lambda function
<3> The name of one of your {jdgserver_name} sites as defined in <@links.ha id="deploy-infinispan-kubernetes-crossdc" />
<2> The AWS Region hosting your {kubernetes} cluster and Lambda function
<3> The name of one of your {jdgserver_name} sites as defined in <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" />
<4> The {jdgserver_name} endpoint URL associated with the CLUSER_1_NAME site
<5> The name of the second {jdgserver_name} site
<6> The {jdgserver_name} endpoint URL associated with the CLUSER_2_NAME site
@ -306,7 +306,7 @@ aws lambda get-function-url-config \
----
https://tjqr2vgc664b6noj6vugprakoq0oausj.lambda-url.eu-west-1.on.aws
----
. In each Kubernetes cluster, configure a Prometheus Alert routing to trigger the Lambda on split-brain
. In each {kubernetes} cluster, configure a Prometheus Alert routing to trigger the Lambda on split-brain
+
.Command:
[source,bash]
@ -314,12 +314,12 @@ https://tjqr2vgc664b6noj6vugprakoq0oausj.lambda-url.eu-west-1.on.aws
<#noparse>
NAMESPACE= # The namespace containing your deployments
kubectl apply -n ${NAMESPACE} -f - << EOF
include::examples/generated/ispn-site-a.yaml[tag=fencing-secret]
include::../examples/generated/ispn-site-a.yaml[tag=fencing-secret]
</#noparse>
---
include::examples/generated/ispn-site-a.yaml[tag=fencing-alert-manager-config]
include::../examples/generated/ispn-site-a.yaml[tag=fencing-alert-manager-config]
---
include::examples/generated/ispn-site-a.yaml[tag=fencing-prometheus-rule]
include::../examples/generated/ispn-site-a.yaml[tag=fencing-prometheus-rule]
----
<1> The username required to authenticate Lambda requests
<2> The password required to authenticate Lambda requests
@ -348,7 +348,7 @@ kubectl -n ${NAMESPACE} rollout status -w deployment/infinispan-router
<1> Scale down the {jdgserver_name} Operator so that the next step does not result in the deployment being recreated by the operator
<2> Scale down the Gossip Router deployment.Replace `$\{NAMESPACE}` with the namespace containing your {jdgserver_name} server
+
. Verify the `SiteOffline` event has been fired on a cluster by inspecting the *Observe* -> *Alerting* menu in the Openshift
. Verify the `SiteOffline` event has been fired on a cluster by inspecting the *Observe* -> *Alerting* menu in the OpenShift
console
+
. Inspect the Global Accelerator EndpointGroup in the AWS console and there should only be a single endpoint present
@ -369,11 +369,11 @@ kubectl -n ${NAMESPACE} rollout status -w deployment/infinispan-router
+
. Inspect the `vendor_jgroups_site_view_status` metric in each site. A value of `1` indicates that the site is reachable.
+
. Update the Accelerator EndpointGroup to contain both Endpoints. See the <@links.ha id="operate-site-online" /> {section} for details.
. Update the Accelerator EndpointGroup to contain both Endpoints. See the <@links.ha id="multi-cluster-operate-site-online" /> {section} for details.
== Further reading
* <@links.ha id="operate-site-online" />
* <@links.ha id="operate-site-offline" />
* <@links.ha id="multi-cluster-operate-site-online" />
* <@links.ha id="multi-cluster-operate-site-offline" />
</@tmpl.guide>

View File

@ -3,15 +3,15 @@
<@tmpl.guide
title="Deploying an AWS Global Accelerator load balancer"
summary="Deploy an AWS Global Accelerator as the load-balancer building block in a multi-site deployment."
summary="Deploy an AWS Global Accelerator as the load-balancer building block in a multi-cluster deployment."
tileVisible="false" >
This topic describes the procedure required to deploy an AWS Global Accelerator to route traffic between multi-site {project_name} deployments.
This topic describes the procedure required to deploy an AWS Global Accelerator to route traffic between multi-cluster {project_name} deployments.
This deployment is intended to be used with the setup described in the <@links.ha id="concepts-multi-site"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="bblocks-multi-site"/> {section}.
This deployment is intended to be used with the setup described in the <@links.ha id="multi-cluster-concepts"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="multi-cluster-building-blocks"/> {section}.
include::partials/blueprint-disclaimer.adoc[]
include::../partials/blueprint-disclaimer.adoc[]
== Audience
@ -48,7 +48,7 @@ Perform the following on each of the {project_name} clusters:
+
.. Login to the ROSA cluster
+
.. Create a Kubernetes load balancer service
.. Create a {kubernetes} load balancer service
+
.Command:
[source,bash]
@ -298,7 +298,7 @@ To verify that the Global Accelerator is correctly configured to connect to the
== Further reading
* <@links.ha id="operate-site-online" />
* <@links.ha id="operate-site-offline" />
* <@links.ha id="multi-cluster-operate-site-online" />
* <@links.ha id="multi-cluster-operate-site-offline" />
</@tmpl.guide>

View File

@ -4,19 +4,19 @@
<@tmpl.guide
title="Deploying {jdgserver_name} for HA with the {jdgserver_name} Operator"
summary="Deploy {jdgserver_name} for high availability in multi availability zones on Kubernetes."
summary="Deploy {jdgserver_name} for high availability in multi availability zones on {kubernetes}."
tileVisible="false"
includedOptions="cache-remote-*">
include::partials/infinispan/infinispan-attributes.adoc[]
include::../partials/infinispan/infinispan-attributes.adoc[]
This {section} describes the procedures required to deploy {jdgserver_name} in a multiple-cluster environment (cross-site).
This {section} describes the procedures required to deploy {jdgserver_name} in a multi-cluster environment (cross-site).
For simplicity, this topic uses the minimum configuration possible that allows {project_name} to be used with an external {jdgserver_name}.
This {section} assumes two {ocp} clusters named `{site-a}` and `{site-b}`.
This is a building block following the concepts described in the <@links.ha id="concepts-multi-site" /> {section}.
See the <@links.ha id="introduction" /> {section} for an overview.
This is a building block following the concepts described in the <@links.ha id="multi-cluster-concepts" /> {section}.
See the <@links.ha id="multi-cluster-introduction" /> {section} for an overview.
[IMPORTANT]
@ -40,12 +40,12 @@ image::high-availability/infinispan-crossdc-az.dio.svg[]
== Prerequisites
include::partials/infinispan/infinispan-prerequisites.adoc[]
include::../partials/infinispan/infinispan-prerequisites.adoc[]
== Procedure
include::partials/infinispan/infinispan-install-operator.adoc[]
include::partials/infinispan/infinispan-credentials.adoc[]
include::../partials/infinispan/infinispan-install-operator.adoc[]
include::../partials/infinispan/infinispan-credentials.adoc[]
+
These commands must be executed on both {ocp} clusters.
@ -160,7 +160,7 @@ A basic example is provided in this {section} using the credentials, tokens, and
.The `Infinispan` CR for `{site-a}`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-crossdc]
include::../examples/generated/ispn-site-a.yaml[tag=infinispan-crossdc]
----
<1> The cluster name
<2> Allows the cluster to be monitored by Prometheus.
@ -184,7 +184,7 @@ Note the differences in point 4, 11 and 13.
.The `Infinispan` CR for `{site-b}`
[source,yaml]
----
include::examples/generated/ispn-site-b.yaml[tag=infinispan-crossdc]
include::../examples/generated/ispn-site-b.yaml[tag=infinispan-crossdc]
----
. Creating the caches for {project_name}.
@ -202,25 +202,25 @@ The following example shows the `Cache` CR for `{site-a}`.
.Cache `actionTokens`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-cache-actionTokens]
include::../examples/generated/ispn-site-a.yaml[tag=infinispan-cache-actionTokens]
----
+
.Cache `authenticationSessions`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-cache-authenticationSessions]
include::../examples/generated/ispn-site-a.yaml[tag=infinispan-cache-authenticationSessions]
----
+
.Cache `loginFailures`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-cache-loginFailures]
include::../examples/generated/ispn-site-a.yaml[tag=infinispan-cache-loginFailures]
----
+
.Cache `work`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-cache-work]
include::../examples/generated/ispn-site-a.yaml[tag=infinispan-cache-work]
----
<1> The transaction mode.
<2> The locking mode used by the transaction.
@ -256,7 +256,7 @@ For `{site-b}`, the `Cache` CR is similar, except for the `backups.<name>` outli
.Example for `actionTokens` cache in `{site-b}`
[source,yaml]
----
include::examples/generated/ispn-site-b.yaml[tag=infinispan-cache-actionTokens]
include::../examples/generated/ispn-site-b.yaml[tag=infinispan-cache-actionTokens]
----
== Verifying the deployment
@ -278,26 +278,26 @@ kubectl wait --for condition=CrossSiteViewFormed --timeout=300s infinispans.infi
[#connecting-infinispan-to-keycloak]
== Connecting {jdgserver_name} with {project_name}
Now that the {jdgserver_name} server is running, here are the relevant {project_name} CR changes necessary to connect it to {project_name}. These changes will be required in the <@links.ha id="deploy-keycloak-kubernetes" /> {section}.
Now that the {jdgserver_name} server is running, here are the relevant {project_name} CR changes necessary to connect it to {project_name}. These changes will be required in the <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" /> {section}.
. Create a Secret with the username and password to connect to the external {jdgserver_name} deployment:
+
[source,yaml]
----
include::examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn-secret]
include::../examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn-secret]
----
. Extend the {project_name} Custom Resource with `additionalOptions` as shown below.
+
[NOTE]
====
All the memory, resource and database configurations are skipped from the CR below as they have been described in the <@links.ha id="deploy-keycloak-kubernetes" /> {section} already.
All the memory, resource and database configurations are skipped from the CR below as they have been described in the <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" /> {section} already.
Administrators should leave those configurations untouched.
====
+
[source,yaml]
----
include::examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn]
include::../examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn]
----
<1> The hostname of the remote {jdgserver_name} cluster.
<2> The port of the remote {jdgserver_name} cluster.
@ -314,6 +314,6 @@ In other environments, add the necessary certificates to {project_name}'s trusts
== Next steps
After the Aurora AWS database and {jdgserver_name} are deployed and running, use the procedure in the <@links.ha id="deploy-keycloak-kubernetes" /> {section} to deploy {project_name} and connect it to all previously created building blocks.
After the AWS Aurora database and {jdgserver_name} are deployed and running, use the procedure in the <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" /> {section} to deploy {project_name} and connect it to all previously created building blocks.
</@tmpl.guide>

View File

@ -0,0 +1,53 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying {project_name} for HA with the Operator"
summary="Deploy {project_name} for high availability with the {project_name} Operator as a building block."
tileVisible="false" >
This guide describes advanced {project_name} configurations for {kubernetes} which are load tested and will recover from single Pod failures.
These instructions are intended for use with the setup described in the <@links.ha id="multi-cluster-concepts"/> {section}.
Use it together with the other building blocks outlined in the <@links.ha id="multi-cluster-building-blocks"/> {section}.
== Prerequisites
* {kubernetes} cluster running.
* Understanding of a <@links.operator id="basic-deployment" /> of {project_name} with the {project_name} Operator.
* AWS Aurora database deployed using the <@links.ha id="multi-cluster-deploy-aurora" /> {section}.
* {jdgserver_name} server deployed using the <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" /> {section}.
== Procedure
. Determine the sizing of the deployment using the <@links.ha id="multi-cluster-concepts-memory-and-cpu-sizing" /> {section}.
. Install the {project_name} Operator as described in the <@links.operator id="installation" /> {section}.
. Notice the configuration file below contains options relevant for connecting to the Aurora database from <@links.ha id="multi-cluster-deploy-aurora" anchor="multi-cluster-aurora-connecting" />
. Notice the configuration file below options relevant for connecting to the {jdgserver_name} server from <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" anchor="connecting-infinispan-to-keycloak" />
. Build a custom {project_name} image which is link:{links_server_db_url}#preparing-keycloak-for-amazon-aurora-postgresql[prepared for usage with the Amazon Aurora PostgreSQL database].
. Deploy the {project_name} CR with the following values with the resource requests and limits calculated in the first step:
+
[source,yaml]
----
include::../examples/generated/keycloak.yaml[tag=keycloak]
----
<1> The database connection pool initial, max and min size should be identical to allow statement caching for the database.
Adjust this number to meet the needs of your system.
As most requests will not touch the database due to the {project_name} embedded cache, this change can serve several hundreds of requests per second.
See the <@links.ha id="multi-cluster-concepts-database-connections" /> {section} for details.
<2> Specify the URL to your custom {project_name} image. If your image is optimized, set the `startOptimized` flag to `true`.
<3> Enable additional features for multi-cluster support like the loadbalancer probe `/lb-check`.
<4> To be able to analyze the system under load, enable the metrics endpoint.
<#include "/high-availability/partials/building-blocks/verifying-deployment.adoc" />
<#include "/high-availability/partials/building-blocks/load-shedding.adoc" />
<#include "/high-availability/partials/building-blocks/sticky-sessions.adoc" />
</@tmpl.guide>

View File

@ -2,14 +2,15 @@
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Health checks for multi-site deployments"
summary="Validate the health of a multi-site deployment." >
title="Health checks for multi-cluster deployments"
summary="Validate the health of a multi-cluster deployment."
tileVisible="false" >
When running the <@links.ha id="introduction" /> in a Kubernetes environment,
When running the <@links.ha id="multi-cluster-introduction" /> in a {kubernetes} environment,
you should automate checks to see if everything is up and running as expected.
This page provides an overview of URLs,
Kubernetes resources, and Healthcheck endpoints available to verify a multi-site setup of {project_name}.
{kubernetes} resources, and Healthcheck endpoints available to verify a multi-cluster setup of {project_name}.
== Overview
@ -21,7 +22,7 @@ Ensuring high availability:: Verifying that all sites and the load balancer are
Maintaining performance:: Checking the health and distribution of the {jdgserver_name} cache ensures that {project_name} can maintain optimal performance by efficiently handling sessions and other temporary data.
Operational resilience:: By continuously monitoring the health of both {project_name} and its dependencies within the Kubernetes environment, the system can quickly identify and possibly auto-remediate issues, reducing downtime.
Operational resilience:: By continuously monitoring the health of both {project_name} and its dependencies within the {kubernetes} environment, the system can quickly identify and possibly auto-remediate issues, reducing downtime.
== Prerequisites
@ -37,7 +38,7 @@ Verifies the health of the {project_name} application through its load balancer
This command returns the health status of the {project_name} application's connection to its configured database, thus confirming the reliability of database connections.
This command is available only on the management port and not from the external URL.
In a Kubernetes setup, the sub-status `health/ready` is checked periodically to make the Pod as ready.
In a {kubernetes} setup, the sub-status `health/ready` is checked periodically to make the Pod as ready.
[source,bash]
----
@ -50,7 +51,7 @@ This command verifies the `lb-check` endpoint of the load balancer and ensures t
curl -s https://keycloak-load-balancer-url/lb-check
----
These commands will return the running status of the Site A and Site B of the {project_name} in a multi-site setup.
These commands will return the running status of the Site A and Site B of the {project_name} in a multi-cluster setup.
[source,bash]
----
@ -90,7 +91,7 @@ curl <infinispan_user>:<infinispan_pwd> -s https://infinispan_rest_url/rest/v2/c
----
=== Overall, {jdgserver_name} system health
Uses the `kubectl` CLI tool to query the health status of {jdgserver_name} clusters and the {project_name} service in the specified namespace. This comprehensive check ensures that all components of the {project_name} deployment are operational and correctly configured within the Kubernetes environment.
Uses the `kubectl` CLI tool to query the health status of {jdgserver_name} clusters and the {project_name} service in the specified namespace. This comprehensive check ensures that all components of the {project_name} deployment are operational and correctly configured within the {kubernetes} environment.
[source,bash]
----
@ -100,8 +101,8 @@ kubectl get infinispan -n <NAMESPACE> -o json \
| jq 'reduce .[] as $item ([]; . + [keys[] | select($item[.] != "True")]) | if length == 0 then "HEALTHY" else "UNHEALTHY: " + (join(", ")) end'
----
=== {project_name} readiness in Kubernetes
Specifically, checks for the readiness and rolling update conditions of {project_name} deployments in Kubernetes,
=== {project_name} readiness in {kubernetes}
Specifically, checks for the readiness and rolling update conditions of {project_name} deployments in {kubernetes},
ensuring that the {project_name} instances are fully operational and not undergoing updates that could impact availability.
[source,bash]

View File

@ -0,0 +1,125 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<#import "/templates/profile.adoc" as profile>
<@tmpl.guide
title="Multi cluster deployments"
summary="Connect multiple {project_name} deployments in independent {kubernetes} clusters" >
{project_name} supports deployments that consist of multiple {project_name} instances that connect to each other using its embedded Infinispan caches. Load balancers can distribute the load evenly across those instances.
Those setups are intended for a transparent networks, see <@links.ha id="single-cluster-introduction" /> for more details.
A multi-cluster setup adds additional components, which allows non-transparent networks to be bridged,
in order to provide additional high availability that may be needed for some environments.
== When to use a multi cluster setup
The multi cluster deployment capabilities of {project_name} are targeted at use cases that:
* Are constrained to a single
<@profile.ifProduct>
AWS Region.
</@profile.ifProduct>
<@profile.ifCommunity>
AWS Region or an equivalent low-latency setup.
</@profile.ifCommunity>
* Permit planned outages for maintenance.
* Fit within a defined user and request count.
* Can accept the impact of periodic outages.
<@profile.ifCommunity>
[#multi-cluster-tested-configuration]
== Tested Configuration
We regularly test {project_name} with the following configuration:
</@profile.ifCommunity>
<@profile.ifProduct>
[#multi-cluster-supported-configuration]
== Supported Configuration
</@profile.ifProduct>
* Two OpenShift single-AZ clusters, in the same AWS Region
** Provisioned with https://www.redhat.com/en/technologies/cloud-computing/openshift/aws[Red Hat OpenShift Service on AWS] (ROSA),
<@profile.ifProduct>
either ROSA HCP or ROSA classic.
</@profile.ifProduct>
<@profile.ifCommunity>
using ROSA HCP.
</@profile.ifCommunity>
** Each OpenShift cluster has all its workers in a single Availability Zone.
** OpenShift version
<@profile.ifProduct>
4.17 (or later).
</@profile.ifProduct>
<@profile.ifCommunity>
4.17.
</@profile.ifCommunity>
* Amazon Aurora PostgreSQL database
** High availability with a primary DB instance in one Availability Zone, and a synchronously replicated reader in the second Availability Zone
** Version ${properties["aurora-postgresql.version"]}
* AWS Global Accelerator, sending traffic to both ROSA clusters
* AWS Lambda
<@profile.ifCommunity>
triggered by ROSA's Prometheus and Alert Manager
</@profile.ifCommunity>
to automate failover
<#include "/high-availability/partials/configuration-disclaimer.adoc" />
Read more on each item in the <@links.ha id="multi-cluster-building-blocks" /> {section}.
[#multi-cluster-load]
<#include "/high-availability/partials/tested-load.adoc" />
See the <@links.ha id="multi-cluster-concepts-memory-and-cpu-sizing" /> {section} for more information.
[#multi-cluster-limitations]
== Limitations
<@profile.ifCommunity>
Even with the additional redundancy of the two sites, downtimes can still occur:
</@profile.ifCommunity>
* During upgrades of {project_name} or {jdgserver_name} both sites needs to be taken offline for the duration of the upgrade.
* During certain failure scenarios, there may be downtime of up to 5 minutes.
* After certain failure scenarios, manual intervention may be required to restore redundancy by bringing the failed site back online.
* During certain switchover scenarios, there may be downtime of up to 5 minutes.
For more details on limitations see the <@links.ha id="multi-cluster-concepts" /> {section}.
== Next steps
The different {sections} introduce the necessary concepts and building blocks.
For each building block, a blueprint shows how to set a fully functional example.
Additional performance tuning and security hardening are still recommended when preparing a production setup.
<@profile.ifCommunity>
== Concept and building block overview
* <@links.ha id="multi-cluster-concepts" />
* <@links.ha id="multi-cluster-building-blocks" />
* <@links.ha id="multi-cluster-concepts-database-connections" />
* <@links.ha id="multi-cluster-concepts-threads" />
* <@links.ha id="multi-cluster-concepts-memory-and-cpu-sizing" />
* <@links.ha id="multi-cluster-concepts-infinispan-cli-batch" />
== Blueprints for building blocks
* <@links.ha id="multi-cluster-deploy-aurora" />
* <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc" />
* <@links.ha id="multi-cluster-deploy-keycloak-kubernetes" />
* <@links.ha id="multi-cluster-deploy-aws-accelerator-loadbalancer" />
* <@links.ha id="multi-cluster-deploy-aws-accelerator-fencing-lambda" />
== Operational procedures
* <@links.ha id="multi-cluster-operate-synchronize" />
* <@links.ha id="multi-cluster-operate-site-offline" />
* <@links.ha id="multi-cluster-operate-site-online" />
* <@links.ha id="multi-cluster-health-checks" />
</@profile.ifCommunity>
</@tmpl.guide>

View File

@ -3,7 +3,8 @@
<@tmpl.guide
title="Taking a site offline"
summary="Take a site offline so that it no longer processes client requests." >
summary="Take a site offline so that it no longer processes client requests."
tileVisible="false" >
== When to use this procedure
@ -19,11 +20,11 @@ Follow these steps to remove a site from the load balancer so that no traffic ca
. Determine the ARN of the Network Load Balancer (NLB) associated with the site to be kept online
+
<#include "partials/accelerator/nlb-arn.adoc" />
<#include "/high-availability/partials/accelerator/nlb-arn.adoc" />
+
. Update the Accelerator EndpointGroup to only include a single site
+
<#include "partials/accelerator/endpoint-group.adoc" />
<#include "/high-availability/partials/accelerator/endpoint-group.adoc" />
+
.Output:
[source,bash]

View File

@ -3,7 +3,8 @@
<@tmpl.guide
title="Bringing a site online"
summary="Bring a site online so that it can process client requests." >
summary="Bring a site online so that it can process client requests."
tileVisible="false" >
== When to use this procedure
@ -18,11 +19,11 @@ Follow these steps to re-add a Keycloak site to the AWS Global Accelerator so th
. Determine the ARN of the Network Load Balancer (NLB) associated with the site to be brought online
+
<#include "partials/accelerator/nlb-arn.adoc" />
<#include "/high-availability/partials/accelerator/nlb-arn.adoc" />
+
. Update the Accelerator EndpointGroup to include both sites
<#include "partials/accelerator/endpoint-group.adoc" />
<#include "/high-availability/partials/accelerator/endpoint-group.adoc" />
+
.Output:
[source,bash]

View File

@ -3,9 +3,10 @@
<@tmpl.guide
title="Synchronizing sites"
summary="Synchronize an offline site with an online site." >
summary="Synchronize an offline site with an online site."
tileVisible="false" >
include::partials/infinispan/infinispan-attributes.adoc[]
include::../partials/infinispan/infinispan-attributes.adoc[]
== When to use this procedure
@ -32,16 +33,16 @@ This will clear all {project_name} caches and prevents the {project_name} state
+
When deploying {project_name} using the {project_name} Operator, change the number of {project_name} instances in the {project_name} Custom Resource to 0.
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
<#include "partials/infinispan/infinispan-cli-clear-caches.adoc" />
<#include "/high-availability/partials/infinispan/infinispan-cli-connect.adoc" />
<#include "/high-availability/partials/infinispan/infinispan-cli-clear-caches.adoc" />
Now we are ready to transfer the state from the active site to the offline site.
. Login into your Active site
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
<#include "/high-availability/partials/infinispan/infinispan-cli-connect.adoc" />
<#include "partials/infinispan/infinispan-cli-state-transfer.adoc" />
<#include "/high-availability/partials/infinispan/infinispan-cli-state-transfer.adoc" />
Now the state is available in the offline site, {project_name} can be started again:
@ -58,10 +59,10 @@ No action required.
=== AWS Global Accelerator
Once the two sites have been synchronized, it is safe to add the previously offline site back to the Global Accelerator
EndpointGroup following the steps in the <@links.ha id="operate-site-online" /> {section}.
EndpointGroup following the steps in the <@links.ha id="multi-cluster-operate-site-online" /> {section}.
== Further reading
See <@links.ha id="concepts-infinispan-cli-batch" />.
See <@links.ha id="multi-cluster-concepts-infinispan-cli-batch" />.
</@tmpl.guide>

View File

@ -11,8 +11,8 @@ aws elbv2 describe-load-balancers \
--output text
</#noparse>
----
<1> The Kubernetes namespace containing the Keycloak deployment
<2> The AWS Region hosting the Kubernetes cluster
<1> The {kubernetes} namespace containing the Keycloak deployment
<2> The AWS Region hosting the {kubernetes} cluster
+
.Output:
[source,bash]

View File

@ -0,0 +1,14 @@
[#${parent}-aurora-architecture]
== Architecture
Aurora database clusters consist of multiple Aurora database instances, with one instance designated as the primary writer and all others as backup readers.
To ensure high availability in the event of availability zone failures, Aurora allows database instances to be deployed across multiple zones in a single AWS region.
In the event of a failure on the availability zone that is hosting the Primary database instance, Aurora automatically heals itself and promotes a reader instance from a non-failed availability zone to be the new writer instance.
.Aurora Multiple Availability Zone Deployment
image::high-availability/aurora-multi-az.dio.svg[]
See the https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html[AWS Aurora documentation] for more details on the semantics provided by Aurora databases.
This documentation follows AWS best practices and creates a private Aurora database that is not exposed to the Internet.
To access the database from a ROSA cluster, <<${parent}-establish-peering-connection-with-rosa-cluster,establish a peering connection between the database and the ROSA cluster>>.

View File

@ -0,0 +1,24 @@
<#include "aurora-architecture.adoc" />
[#${parent}-aurora-procedure]
== Procedure
The following procedure contains two sections:
* Creation of an Aurora Multi-AZ database cluster with the name "keycloak-aurora" in eu-west-1.
* Creation of a peering connection between the ROSA cluster(s) and the Aurora VPC to allow applications deployed on the ROSA clusters to establish connections with the database.
[#${parent}-aurora-create]
=== Create Aurora database Cluster
include::../partials/aurora/aurora-multiaz-create-procedure.adoc[]
[#${parent}-establish-peering-connection-with-rosa-cluster]
=== Establish Peering Connection with ROSA cluster
include::../partials/aurora/aurora-create-peering-connections.adoc[]
[#${parent}-aurora-verify]
== Verifying the connection
include::../partials/aurora/aurora-verify-peering-connections.adoc[]

View File

@ -1,5 +1,5 @@
The simplest way to verify that a connection is possible between a ROSA cluster and an Aurora DB cluster is to deploy
`psql` on the Openshift cluster and attempt to connect to the writer endpoint.
`psql` on the OpenShift cluster and attempt to connect to the writer endpoint.
The following command creates a pod in the default namespace and establishes a `psql` connection with the Aurora cluster if possible.
Upon exiting the pod shell, the pod is deleted.

View File

@ -0,0 +1,18 @@
[#${parent}-load-shedding]
== Optional: Load shedding
To enable load shedding, limit the number of queued requests.
.Load shedding with max queued http requests
[source,yaml,indent=0]
----
spec:
additionalOptions:
include::../examples/generated/keycloak.yaml[tag=keycloak-queue-size]
----
All exceeding requests are served with an HTTP 503.
You might consider limiting the value for `http-pool-max-threads` further because multiple concurrent threads will lead to throttling by {kubernetes} once the requested CPU limit is reached.
See the <@links.ha id="${parent}-concepts-threads" anchor="${parent}-load-shedding"/> {section} about load shedding for details.

View File

@ -0,0 +1,19 @@
[#${parent}-sticky-sessions]
== Optional: Disable sticky sessions
When running on OpenShift and the default passthrough Ingress setup as provided by the {project_name} Operator, the load balancing done by HAProxy is done by using sticky sessions based on the IP address of the source.
When running load tests, or when having a reverse proxy in front of HAProxy, you might want to disable this setup to avoid receiving all requests on a single {project_name} Pod.
Add the following supplementary configuration under the `spec` in the {project_name} Custom Resource to disable sticky sessions.
[source,yaml,subs="attributes+"]
----
spec:
ingress:
enabled: true
annotations:
# When running load tests, disable sticky sessions on the OpenShift HAProxy router
# to avoid receiving all requests on a single {project_name} Pod.
haproxy.router.openshift.io/balance: roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
----

View File

@ -0,0 +1,10 @@
[#${parent}-verify-deployment]
== Verifying the deployment
Confirm that the {project_name} deployment is ready.
[source,bash]
----
kubectl wait --for=condition=Ready keycloaks.k8s.keycloak.org/keycloak
kubectl wait --for=condition=RollingUpdate=False keycloaks.k8s.keycloak.org/keycloak
----

View File

@ -0,0 +1,17 @@
Sizing of a {project_name} instance depends on the actual and forecasted numbers for password-based user logins, refresh token requests, and client credential grants as described in the previous section.
To retrieve the actual numbers of a running {project_name} instance for these three key inputs, use the metrics {project_name} provides:
* The user event metric `keycloak_user_events_total` for event type `login` includes both password-based logins and cookie-based logins, still it can serve as a first approximate input for this sizing guide.
* To find out number of password validations performed by {project_name} use the metric `keycloak_credentials_password_hashing_validations_total`.
The metric also contains tags providing some details about the hashing algorithm used and the outcome of the validation.
Here is the list of available tags: `realm`, `algorithm`, `hashing_strength`, `outcome`.
* Use the user event metric `keycloak_user_events_total` for the event types `refresh_token` and `client_login` for refresh token requests and client credential grants respectively.
See the <@links.observability id="event-metrics"/> and <@links.observability id="metrics-for-troubleshooting-http"/> {sections} for more information.
These metrics are crucial for tracking daily and weekly fluctuations in user activity loads,
identifying emerging trends that may indicate the need to resize the system and
validating sizing calculations.
By systematically measuring and evaluating these user event metrics,
you can ensure your system remains appropriately scaled and responsive to changes in user behavior and demand.

View File

@ -0,0 +1,49 @@
[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a multi-cluster setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when {project_name} instances running for a longer time.
This will decrease response times and reduce IOPS on the database.
Still, those caches need to be filled when an instance is restarted, so do not set resources too tight based on the stable state measured once the caches have been filled.
* Use these values as a starting point and perform your own load tests before going into production.
====
Summary:
* The used CPU scales linearly with the number of requests up to the tested limit below.
Recommendations:
* The base memory usage for a Pod including caches of Realm data and 10,000 cached sessions is 1250 MB of RAM.
* In containers, Keycloak allocates 70% of the memory limit for heap-based memory. It will also use approximately 300 MB of non-heap-based memory.
To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.
* For each 15 password-based user logins per second, allocate 1 vCPU to the cluster (tested with up to 300 per second).
+
{project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.
* For each 120 client credential grants per second, 1 vCPU to the cluster (tested with up to 2000 per second).^*^
+
Most CPU time goes into creating new TLS connections, as each client runs only a single request.
* For each 120 refresh token requests per second, 1 vCPU to the cluster (tested with up to 435 refresh token requests per second).^*^
* Leave 150% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and enough capacity to handle failover tasks.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.
* When performing requests with more than 2500 different clients concurrently, not all client information will fit into {project_name}'s caches when those are using the standard cache sizes of 10000 entries each.
Due to this, the database may become a bottleneck as client data is reloaded frequently from the database.
To reduce the database usage, increase the `users` cache size by two times the number of concurrently used clients, and the `realms` cache size by four times the number of concurrently used clients.
{project_name}, which by default stores user sessions in the database, requires the following resources for optimal performance on an Aurora PostgreSQL multi-AZ database:
For every 100 login/logout/refresh requests per second:
- Budget for 1400 Write IOPS.
- Allocate between 0.35 and 0.7 vCPU.
The vCPU requirement is given as a range, as with an increased CPU saturation on the database host the CPU usage per request decreases while the response times increase. A lower CPU quota on the database can lead to slower response times during peak loads. Choose a larger CPU quota if fast response times during peak loads are critical. See below for an example.

View File

@ -0,0 +1,43 @@
Target size:
* 45 logins and logouts per seconds
* 360 client credential grants per second^*^
* 360 refresh token requests per second (1:8 ratio for logins)^*^
* 3 Pods
Limits calculated:
* CPU requested per Pod: 3 vCPU
+
(45 logins per second = 3 vCPU, 360 client credential grants per second = 3 vCPU, 360 refresh tokens = 3 vCPU. This sums up to 9 vCPU total. With 3 Pods running in the cluster, each Pod then requests 3 vCPU)
* CPU limit per Pod: 7.5 vCPU
+
(Allow for an additional 150% CPU requested to handle peaks, startups and failover tasks)
* Memory requested per Pod: 1250 MB
+
(1250 MB base memory)
* Memory limit per Pod: 1360 MB
+
(1250 MB expected memory usage minus 300 non-heap-usage, divided by 0.7)
* Aurora Database instance: either `db.t4g.large` or `db.t4g.xlarge` depending on the required response times during peak loads.
+
(45 logins per second, 5 logouts per second, 360 refresh tokens per seconds.
This sums up to 410 requests per second.
This expected DB usage is 1.4 to 2.8 vCPU, with a DB idle load of 0.3 vCPU.
This indicates either a 2 vCPU `db.t4g.large` instance or a 4 vCPU `db.t4g.xlarge` instance.
A 2 vCPU `db.t4g.large` would be more cost-effective if the response times are allowed to be higher during peak usage.
In our tests, the median response time for a login and a token refresh increased by up to 120 ms once the CPU saturation reached 90% on a 2 vCPU `db.t4g.large` instance given this scenario.
For faster response times during peak usage, consider a 4 vCPU `db.t4g.xlarge` instance for this scenario.)
////
<#noparse>
./benchmark.sh eu-west-1 --scenario=keycloak.scenario.authentication.AuthorizationCode --server-url=${KEYCLOAK_URL} --realm-name=realm-0 --users-per-sec=45 --ramp-up=10 --refresh-token-period=2 --refresh-token-count=8 --logout-percentage=10 --measurement=600 --users-per-realm=20000 --log-http-on-failure
</#noparse>
////

View File

@ -1,33 +1,11 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for configuring thread pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand the considerations and best practices on how to configure thread pools connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="deploy-keycloak-kubernetes" />.
== Concepts
=== JGroups communications
// remove this paragraph once OpenJDK 17 is no longer supported on the server side.
// https://github.com/keycloak/keycloak/issues/31101
JGroups communications, which is used in single-site setups for the communication between {project_name} nodes, benefits from the use of virtual threads which are available in OpenJDK 21 when at least two cores are available for {project_name}.
This reduces the memory usage and removes the need to configure thread pool sizes.
Therefore, the use of OpenJDK 21 is recommended.
[#${parent}-quarkus-executor-pool]
=== Quarkus executor pool
{project_name} requests, as well as blocking probes, are handled by an executor pool. Depending on the available CPU cores, it has a maximum size of 50 or more threads.
Threads are created as needed, and will end when no longer needed, so the system will scale up and down automatically.
{project_name} allows configuring the maximum thread pool size by the link:{links_server_all-config_url}?q=http-pool-max-threads[`http-pool-max-threads`] configuration option. See <@links.ha id="deploy-keycloak-kubernetes" /> for an example.
{project_name} allows configuring the maximum thread pool size by the link:{links_server_all-config_url}?q=http-pool-max-threads[`http-pool-max-threads`] configuration option.
When running on Kubernetes, adjust the number of worker threads to avoid creating more load than what the CPU limit allows for the Pod to avoid throttling, which would lead to congestion.
When running on {kubernetes}, adjust the number of worker threads to avoid creating more load than what the CPU limit allows for the Pod to avoid throttling, which would lead to congestion.
When running on physical machines, adjust the number of worker threads to avoid creating more load than the node can handle to avoid congestion.
Congestion would result in longer response times and an increased memory usage, and eventually an unstable system.
@ -40,7 +18,7 @@ If you increase the number of database connections and the number of threads too
The number of database connections is configured via the link:{links_server_all-config_url}?q=db-pool[`Database` settings `db-pool-initial-size`, `db-pool-min-size` and `db-pool-max-size`] respectively.
Low numbers ensure fast response times for all clients, even if there is an occasionally failing request when there is a load spike.
[#load-shedding]
[#${parent}-load-shedding]
=== Load Shedding
By default, {project_name} will queue all incoming requests infinitely, even if the request processing stalls.
@ -53,7 +31,7 @@ Assuming a {project_name} Pod processes around 200 requests per second, a queue
When this setting is active, requests that exceed the number of queued requests will return with an HTTP 503 error.
{project_name} logs the error message in its log.
[#probes]
[#${parent}-probes]
=== Probes
{project_name}'s liveness probe is non-blocking to avoid a restart of a Pod under a high load.
@ -62,10 +40,9 @@ When this setting is active, requests that exceed the number of queued requests
The overall health probe and the readiness probe can in some cases block to check the connection to the database, so they might fail under a high load.
Due to this, a Pod can become non-ready under a high load.
[#${parent}-os-resources]
=== OS Resources
In order for Java to create threads, when running on Linux it needs to have file handles available.
Therefore, the number of open files (as retrieved as `ulimit -n` on Linux) need to provide head-space for {project_name} to increase the number of threads needed.
Each thread will also consume memory, and the container memory limits need to be set to a value that allows for this or the Pod will be killed by Kubernetes.
</@tmpl.guide>
Each thread will also consume memory, and the container memory limits need to be set to a value that allows for this or the Pod will be killed by {kubernetes}.

View File

@ -0,0 +1,7 @@
<@profile.ifProduct>
Any deviation from the configuration above is not supported and any issue must be replicated in that environment for support.
</@profile.ifProduct>
<@profile.ifCommunity>
While equivalent setups should work, you will need to verify the performance and failure behavior of your environment.
We provide functional tests, failure tests and load tests in the https://github.com/keycloak/keycloak-benchmark[Keycloak Benchmark Project].
</@profile.ifCommunity>

View File

@ -1,3 +1,8 @@
Creating new database connections is expensive as it takes time.
Creating them when a request arrives will delay the response, so it is good to have them created before the request arrives.
It can also contribute to a https://en.wikipedia.org/wiki/Cache_stampede[stampede effect] where creating a lot of connections in a short time makes things worse as it slows down the system and blocks threads.
Closing a connection also invalidates all server side statements caching for that connection.
For the best performance, the values for the initial, minimal and maximum database connection pool size should all be equal.
This avoids creating new database connections when a new request comes in which is costly.

View File

@ -7,7 +7,7 @@ kubectl -n {ns} exec -it pods/{cluster-name}-0 -- ./bin/cli.sh --trustall --conn
----
+
It asks for the username and password for the {jdgserver_name} cluster.
Those credentials are the one set in the <@links.ha id="deploy-infinispan-kubernetes-crossdc"/> {section} in the configuring credentials section.
Those credentials are the one set in the <@links.ha id="multi-cluster-deploy-infinispan-kubernetes-crossdc"/> {section} in the configuring credentials section.
+
.Output:
[source,bash,subs="+attributes"]

View File

@ -15,7 +15,7 @@ credentials:
+
The `identities.yaml` could be set in a secret as one of the following:
* As a Kubernetes Resource:
* As a {kubernetes} Resource:
+
.Credential Secret
[.wrap]

View File

@ -1,2 +1,2 @@
* OpenShift or Kubernetes cluster running
* {kubernetes} cluster running
* Understanding of the {infinispan-operator-docs}[{jdgserver_name} Operator]

View File

@ -0,0 +1,15 @@
<@profile.ifProduct>
== Maximum load
</@profile.ifProduct>
<@profile.ifCommunity>
== Tested load
We regularly test {project_name} with the following load:
</@profile.ifCommunity>
* 100,000 users
* 300 requests per second
<@profile.ifCommunity>
While we did not see a hard limit in our tests with these values, we ask you to test for higher volumes with horizontally and vertically scaled {project_name} name instances and databases.
</@profile.ifCommunity>

View File

@ -1,16 +1,25 @@
introduction
concepts-multi-site
bblocks-multi-site
concepts-database-connections
concepts-threads
concepts-memory-and-cpu-sizing
concepts-infinispan-cli-batch
deploy-aurora-multi-az
deploy-infinispan-kubernetes-crossdc
deploy-keycloak-kubernetes
deploy-aws-accelerator-loadbalancer
deploy-aws-accelerator-fencing-lambda
operate-site-offline
operate-site-online
operate-synchronize
health-checks-multi-site
single-cluster/introduction
single-cluster/concepts
single-cluster/building-blocks
single-cluster/concepts-database-connections
single-cluster/concepts-threads
single-cluster/concepts-memory-and-cpu-sizing
single-cluster/deploy-aurora
single-cluster/deploy-keycloak
multi-cluster/introduction
multi-cluster/concepts
multi-cluster/building-blocks
multi-cluster/concepts-database-connections
multi-cluster/concepts-threads
multi-cluster/concepts-memory-and-cpu-sizing
multi-cluster/concepts-infinispan-cli-batch
multi-cluster/deploy-aurora
multi-cluster/deploy-infinispan-kubernetes-crossdc
multi-cluster/deploy-keycloak-kubernetes
multi-cluster/deploy-aws-accelerator-loadbalancer
multi-cluster/deploy-aws-accelerator-fencing-lambda
multi-cluster/operate-site-offline
multi-cluster/operate-site-online
multi-cluster/operate-synchronize
multi-cluster/health-checks

View File

@ -0,0 +1,47 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Building blocks single-cluster deployments"
summary="Learn about building blocks and suggested setups for single-cluster deployments."
tileVisible="false" >
The following building blocks are needed to set up a single-cluster deployment.
The building blocks link to a blueprint with an example configuration.
They are listed in the order in which they need to be installed.
include::../partials/blueprint-disclaimer.adoc[]
[#single-cluster-blocks-prerequisites]
== Prerequisites
* Understanding the concepts laid out in the <@links.ha id="single-cluster-concepts"/> {section}.
[#single-cluster-blocks-low-latency]
== Multiple availability-zones with low-latency connection
Ensures that synchronous replication is available for both the database and {project_name} clustering.
*Suggested setup:* A {kubernetes} cluster consisting of two or more AWS Availability Zones within the same AWS Region.
*Not considered:* {kubernetes} clusters spread across multiple regions on the same or different continents, as it would increase the latency and the likelihood of network failures.
Synchronous replication of databases as services with Aurora Regional Deployments on AWS is only available within the same region.
[#single-cluster-blocks-database]
== Database
A synchronously replicated database available across all availability-zones.
*Blueprint:* <@links.ha id="single-cluster-deploy-aurora"/>.
[#single-cluster-blocks-keycloak]
== {project_name}
A clustered deployment of {project_name} with pods distributed across availability-zones.
*Blueprint:* <@links.ha id="single-cluster-deploy-keycloak" />.
</@tmpl.guide>

View File

@ -0,0 +1,17 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for database connection pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand considerations and best practices on how to configure database connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="single-cluster-deploy-keycloak" />.
[#single-cluster-db-concepts]
== Concepts
include::../partials/database-connections/configure-db-connection-pool-best-practices.adoc[]
</@tmpl.guide>

View File

@ -0,0 +1,27 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for sizing CPU and memory resources"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
Use this as a starting point to size a product environment.
Adjust the values for your environment as needed based on your load tests.
[#single-cluster-performance-recommendations]
== Performance recommendations
include::../partials/concepts/perf_recommendations.adoc[]
[#single-cluster-measture-running-instance]
=== Measuring the activity of a running {project_name} instance
<#include "/high-availability/partials/concepts/perf_measuring_running_instance.adoc" />
[#single-cluster-single-site-calculation]
=== Calculation example (single site)
include::../partials/concepts/perf_single_site_calculation.adoc[]
</@tmpl.guide>

View File

@ -0,0 +1,28 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for configuring thread pools"
summary="Understand concepts for avoiding resource exhaustion and congestion."
tileVisible="false" >
This section is intended when you want to understand the considerations and best practices on how to configure thread pools connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="single-cluster-deploy-keycloak" />.
[#single-cluster-threads-concept]
== Concepts
[#single-cluster-jgroups-communications]
=== JGroups communications
// remove this paragraph once OpenJDK 17 is no longer supported on the server side.
// https://github.com/keycloak/keycloak/issues/31101
JGroups communications, which is used in single cluster setups for the communication between {project_name} nodes,
benefits from the use of virtual threads which are available in OpenJDK 21 when at least two cores are available for {project_name}.
This reduces the memory usage and removes the need to configure thread pool sizes.
Therefore, the use of OpenJDK 21 is recommended.
<#include "/high-availability/partials/concepts/threads.adoc" />
</@tmpl.guide>

View File

@ -0,0 +1,164 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for single-cluster deployments"
summary="Understand single-cluster deployment with synchronous replication."
tileVisible="false" >
This topic describes a single-cluster setup and the behavior to expect.
It outlines the requirements of the high availability architecture and describes the benefits and tradeoffs.
[#single-cluster-when-to-use]
== When to use this setup
Use this setup to provide {project_name} deployments that are deployed to a setup with transparent networking.
To provide a more concrete example, the following chapter assumes a deployment contained within a single {kubernetes} cluster.
The same concepts could be applied to a set of virtual or physical machines and a manual or scripted deployment.
== Single or multiple availability-zones
The behaviour and high-availability guarantees of the {project_name} deployment are ultimately determined by the configuration of
the {kubernetes} cluster. Typically, {kubernetes} clusters are deployed on a single availability-zone, however in order to
increase fault-tolerance, it is possible to https://kubernetes.io/docs/setup/best-practices/multiple-zones/[deploy the cluster across multiple availability-zones].
The {project_name} Operator defines the following anti-affinity strategy by default to prefer that {project_name} pods are
deployed on distinct nodes and distinct availability-zones when possible:
[source,yaml]
----
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: keycloak
app.kubernetes.io/managed-by: keycloak-operator
app.kubernetes.io/component: server
app.kubernetes.io/instance: keycloak
topologyKey: topology.kubernetes.io/zone
- weight: 90
podAffinityTerm:
labelSelector:
matchLabels:
app: keycloak
app.kubernetes.io/managed-by: keycloak-operator
app.kubernetes.io/component: server
app.kubernetes.io/instance: keycloak
topologyKey: kubernetes.io/hostname
----
[IMPORTANT]
====
In order to ensure high-availability with multiple availability-zones, it is crucial that the Database is also able to
withstand zone failures as {project_name} depends on the underlying database to remain available.
====
== Failures which this setup can survive
Deploying {project_name} on a single or across multiple availability-zones changes the high-availability characteristics
significantly, therefore we consider these architectures independently.
=== Single Zone
[%autowidth]
|===
| Failure | Recovery | RPO^1^ | RTO^2^
| {project_name} Pod
| Multiple {project_name} Pods run in a cluster. If one instance fails some incoming requests might receive an error message or are delayed for some seconds.
| No data loss
| Less than 30 seconds
| {kubernetes} Node
| Multiple {project_name} Pods run in a cluster. If the host node dies, then all pods on that node will fail and some incoming requests might receive an error message or are delayed for some seconds.
| No data loss
| Less than 30 seconds
| {project_name} Clustering Connectivity
| If the connectivity between {kubernetes} nodes is lost, data cannot be sent between {project_name} pods hosted on those nodes.
Incoming requests might receive an error message or be delayed for some seconds.
The {project_name} will eventually remove the unreachable pods from its local view and will stop sending data to them.
| No data loss
| Seconds to minutes
|===
.Table footnotes:
^1^ Recovery point objective, assuming all parts of the setup were healthy at the time this occurred. +
^2^ Recovery time objective. +
=== Multiple Zones
[%autowidth]
|===
| Failure | Recovery | RPO^1^ | RTO^2^
| Database node^3^
| If the writer instance fails, the database can promote a reader instance in the same or other zone to be the new writer.
| No data loss
| Seconds to minutes (depending on the database)
| {project_name} pod
| Multiple {project_name} instances run in a cluster. If one instance fails some incoming requests might receive an error message or are delayed for some seconds.
| No data loss
| Less than 30 seconds
| {kubernetes} Node
| Multiple {project_name} pods run in a cluster. If the host node dies, then all pods on that node will fail and some incoming requests might receive an error message or are delayed for some seconds.
| No data loss
| Less than 30 seconds
| Availability zone failure
| If an availability-zone fails, all {project_name} pods hosted in that zone will also fail. Deploying at least the same number
of {project_name} replicas as availability-zones should ensure that no data is lost and minimal downtime occurs as there will
be other pods available to service requests.
| No data loss
| Seconds
| Connectivity database
| If the connectivity between availability-zones is lost, the synchronous replication will fail.
Some requests might receive an error message or be delayed for a few seconds.
Manual operations might be necessary depending on the database.
| No data loss^3^
| Seconds to minutes (depending on the database)
| {project_name} Clustering Connectivity
| If the connectivity between {kubernetes} nodes is lost, data cannot be sent between {project_name} pods hosted on those nodes.
Incoming requests might receive an error message or be delayed for some seconds.
The {project_name} will eventually remove the unreachable pods from its local view and will stop sending data to them.
| No data loss
| Seconds to minutes
|===
.Table footnotes:
^1^ Recovery point objective, assuming all parts of the setup were healthy at the time this occurred. +
^2^ Recovery time objective. +
^3^ Assumes that the database is also replicated across multiple availability-zones
== Known limitations
. Downtime during rollouts of {project_name} upgrades
+
This can be overcome for patch releases by enabling <@links.server id="update-compatibility" anchor="rolling-updates-for-patch-releases" />.
+
. Multiple node failures can result in a loss of entries from the `authenticationSessions`, `loginFailures`
and `actionTokens` caches if the number of node failures is greater than or equal to the cache's configured `num_owners`.
+
. Deployments using the default `preferredDuringSchedulingIgnoredDuringExecution` anti-affinity rules,
may experience data-loss on node/availability-zone failure if multiple pods are scheduled on the failed node/zone.
+
Users can prevent this scenario by explicitly configuring anti-affinity strategies with `requiredDuringSchedulingIgnoredDuringExecution`
to ensure that pods are always provisioned on distinct nodes or zones. However, this comes at the expense of
flexibility as the {project_name} will not scale up to the expected number of pods if the defined rules cannot be satisfied.
+
See the Operator <@links.operator id="advanced-configuration" anchor="_scheduling" /> details of how to configure custom
anti-affinity strategies.
== Next steps
Continue reading in the <@links.ha id="single-cluster-building-blocks" /> {section} to find blueprints for the different building blocks.
</@tmpl.guide>

View File

@ -0,0 +1,32 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying AWS Aurora in multiple availability zones"
summary="Deploy an AWS Aurora as the database building block in a single-cluster deployment."
tileVisible="false" >
This topic describes how to deploy an Aurora regional deployment of a PostgreSQL instance across multiple availability zones to tolerate one or more availability zone failures in a given AWS region.
This deployment is intended to be used with the setup described in the <@links.ha id="single-cluster-concepts"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="single-cluster-building-blocks"/> {section}.
include::../partials/blueprint-disclaimer.adoc[]
<#include "/high-availability/partials/aurora/aurora-single-site.adoc" />
[#single-cluster-aurora-connecting]
== Connecting Aurora database with {project_name}
Now that an Aurora database has been established and linked with all of your ROSA clusters, here are the relevant {project_name} CR options to connect the Aurora database with {project_name}. These changes will be required in the <@links.ha id="single-cluster-deploy-keycloak" /> {section}. The JDBC url is configured to use the Aurora database writer endpoint.
. Update `spec.db.url` to be `jdbc:aws-wrapper:postgresql://$HOST:5432/keycloak` where `$HOST` is the
<<aurora-writer-url, Aurora writer endpoint URL>>.
. Ensure that the Secrets referenced by `spec.db.usernameSecret` and `spec.db.passwordSecret` contain usernames and passwords defined when creating Aurora.
</@tmpl.guide>
== Next steps
After successful deployment of the Aurora database continue with <@links.ha id="single-cluster-deploy-keycloak" />

View File

@ -0,0 +1,94 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploying {project_name} across multiple availability-zones with the Operator"
summary="Deploy {project_name} for high availability with the {project_name} Operator as a building block."
tileVisible="false" >
This guide describes advanced {project_name} configurations for {kubernetes} which are load tested and will recover availability-zone
failures.
These instructions are intended for use with the setup described in the <@links.ha id="single-cluster-concepts"/> {section}.
Use it together with the other building blocks outlined in the <@links.ha id="single-cluster-building-blocks"/> {section}.
[#single-cluster-deploy-keycloak-prerequisites]
== Prerequisites
* {kubernetes} cluster deployed across multiple availability-zones with a worker-pool configured for each.
* Understanding of a <@links.operator id="basic-deployment" /> of {project_name} with the {project_name} Operator.
* AWS Aurora database deployed using the <@links.ha id="single-cluster-deploy-aurora" /> {section}.
[#single-cluster-deploy-keycloak-procedure]
== Procedure
. Determine the sizing of the deployment using the <@links.ha id="single-cluster-concepts-memory-and-cpu-sizing" /> {section}.
. Install the {project_name} Operator as described in the <@links.operator id="installation" /> {section}.
. Notice the configuration file below contains options relevant for connecting to the Aurora database from <@links.ha id="single-cluster-deploy-aurora" anchor="single-cluster-aurora-connecting" />
. Build a custom {project_name} image which is link:{links_server_db_url}#preparing-keycloak-for-amazon-aurora-postgresql[prepared for usage with the Amazon Aurora PostgreSQL database].
. Deploy the {project_name} CR with the following values with the resource requests and limits calculated in the first step:
+
[source,yaml]
----
apiVersion: k8s.keycloak.org/v2alpha1
kind: Keycloak
metadata:
labels:
app: keycloak
name: keycloak
namespace: keycloak
spec:
hostname:
hostname: <KEYCLOAK_URL_HERE>
resources:
requests:
cpu: "2"
memory: "1250M"
limits:
cpu: "6"
memory: "2250M"
db:
vendor: postgres
url: jdbc:aws-wrapper:postgresql://<AWS_AURORA_URL_HERE>:5432/keycloak
poolMinSize: 30 # <1>
poolInitialSize: 30
poolMaxSize: 30
usernameSecret:
name: keycloak-db-secret
key: username
passwordSecret:
name: keycloak-db-secret
key: password
image: <KEYCLOAK_IMAGE_HERE> # <2>
startOptimized: false # <2>
additionalOptions:
- name: log-console-output
value: json
- name: metrics-enabled # <3>
value: 'true'
- name: event-metrics-user-enabled
value: 'true'
- name: db-driver
value: software.amazon.jdbc.Driver
http:
tlsSecret: keycloak-tls-secret
instances: 3
----
<1> The database connection pool initial, max and min size should be identical to allow statement caching for the database.
Adjust this number to meet the needs of your system.
As most requests will not touch the database due to the {project_name} embedded cache, this change can server several hundreds of requests per second.
See the <@links.ha id="single-cluster-concepts-database-connections" /> {section} for details.
<2> Specify the URL to your custom {project_name} image. If your image is optimized, set the `startOptimized` flag to `true`.
<3> To be able to analyze the system under load, enable the metrics endpoint.
<#include "/high-availability/partials/building-blocks/verifying-deployment.adoc" />
<#include "/high-availability/partials/building-blocks/load-shedding.adoc" />
<#include "/high-availability/partials/building-blocks/sticky-sessions.adoc" />
</@tmpl.guide>

View File

@ -0,0 +1,97 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<#import "/templates/profile.adoc" as profile>
<@tmpl.guide
title="Single cluster deployments"
summary="Deploy a single Keycloak cluster, optionally stretched across multiple availability-zones" >
== When to use a single cluster setup
The {project_name} single cluster architecture is targeted at use cases that:
* Deploy to an infrastructure with transparent networking, like for example a single {kubernetes} cluster.
* Are constrained to a single
<@profile.ifProduct>
AWS Region.
</@profile.ifProduct>
<@profile.ifCommunity>
AWS Region or an equivalent low-latency setup.
</@profile.ifCommunity>
* Permit planned outages for maintenance.
* Fit within a defined user and request count.
* Can accept the impact of periodic outages.
<@profile.ifCommunity>
[#single-cluster-tested-configuration]
== Tested Configuration
We regularly test {project_name} with the following configuration:
</@profile.ifCommunity>
<@profile.ifProduct>
[#single-cluster-supported-configuration]
== Supported Configuration
</@profile.ifProduct>
* An OpenShift cluster deployed across three availability-zones
** Provisioned with https://www.redhat.com/en/technologies/cloud-computing/openshift/aws[Red Hat OpenShift Service on AWS] (ROSA),
<@profile.ifProduct>
either ROSA HCP or ROSA classic.
</@profile.ifProduct>
<@profile.ifCommunity>
using ROSA HCP.
</@profile.ifCommunity>
** At least one worker node for each availability-zone
** OpenShift version
<@profile.ifProduct>
4.17 (or later).
</@profile.ifProduct>
<@profile.ifCommunity>
4.17.
</@profile.ifCommunity>
* Amazon Aurora PostgreSQL database
** High availability with a primary DB instance in one Availability Zone, and synchronously replicated readers in the other Availability Zones
** Version ${properties["aurora-postgresql.version"]}
<#include "/high-availability/partials/configuration-disclaimer.adoc" />
Read more on each item in the <@links.ha id="single-cluster-building-blocks" /> {section}.
[#single-cluster-load]
<#include "/high-availability/partials/tested-load.adoc" />
[#single-cluster-limitations]
== Limitations
<@profile.ifCommunity>
Even with the additional redundancy of three availability-zones, downtime can still occur when:
</@profile.ifCommunity>
* Simultaneous node failures occur
* Rolling out {project_name} upgrades
* Infrastructure fails, for example the {kubernetes} cluster
For more details on limitations see the <@links.ha id="single-cluster-concepts" /> {section}.
== Next steps
The different {sections} introduce the necessary concepts and building blocks.
For each building block, a blueprint shows how to deploy a fully functional example.
Additional performance tuning and security hardening are still recommended when preparing a production setup.
<@profile.ifCommunity>
== Concept and building block overview
* <@links.ha id="single-cluster-concepts" />
* <@links.ha id="single-cluster-building-blocks" />
* <@links.ha id="single-cluster-concepts-database-connections" />
* <@links.ha id="single-cluster-concepts-threads" />
* <@links.ha id="single-cluster-concepts-memory-and-cpu-sizing" />
== Blueprints for building blocks
* <@links.ha id="single-cluster-deploy-aurora" />
* <@links.ha id="single-cluster-deploy-keycloak" />
</@profile.ifCommunity>
</@tmpl.guide>

View File

@ -2,7 +2,7 @@
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Embedded Infinispan metrics for multi-site deployments"
title="Embedded Infinispan metrics for multi-cluster deployments"
summary="Use metrics to monitor caching health."
tileVisible="false"
>

View File

@ -2,7 +2,7 @@
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Embedded Infinispan metrics for single site deployments"
title="Embedded Infinispan metrics for single cluster deployments"
summary="Use metrics to monitor caching health and cluster replication."
tileVisible="false"
>

View File

@ -97,10 +97,9 @@
<resources>
<resource>
<directory>${basedir}/</directory>
<includes>
<include>**/examples/**/*.*</include>
<include>**/partials/**/*.*</include>
</includes>
<excludes>
<exclude>**/templates/**</exclude>
</excludes>
</resource>
</resources>
</configuration>
@ -212,8 +211,7 @@
</goals>
<configuration>
<sourceDirectory>${basedir}/target/generated-guides/high-availability</sourceDirectory>
<outputDirectory>
${project.build.directory}/generated-docs/high-availability</outputDirectory>
<outputDirectory>${project.build.directory}/generated-docs/high-availability</outputDirectory>
</configuration>
</execution>
<execution>

View File

@ -37,7 +37,7 @@ Using distributed cache may lead to results where the SAML logout request would
to SAML session index to HTTP session mapping which would lead to unsuccessful logout.
[[_saml_logout_in_cross_dc]]
=== Logout in cross-site scenario
=== Logout in multi-cluster deployments
Special handling is needed for handling sessions that span multiple data centers. Imagine the following scenario:

View File

@ -1,39 +1,41 @@
package org.keycloak.guides;
import freemarker.template.TemplateException;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Properties;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.keycloak.guides.maven.GuideBuilder;
import org.keycloak.guides.maven.GuideMojo;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
import java.util.Properties;
import freemarker.template.TemplateException;
public class DocsBuildDebugUtil {
public static void main(String[] args) throws IOException, TemplateException, ParserConfigurationException, SAXException {
File usrDir = new File(System.getProperty("user.dir"));
Properties properties = readPropertiesFromPomXml();
for (File srcDir: usrDir.toPath().resolve("docs/guides").toFile().listFiles(d -> d.isDirectory() && !d.getName().equals("templates"))) {
if (srcDir.getName().equals("target") || srcDir.getName().equals("src")) {
// those are standard maven folders, ignore them
continue;
}
File targetDir = usrDir.toPath().resolve("target/generated-guides/" + srcDir.getName()).toFile();
targetDir.mkdirs();
Path usrDir = Paths.get(System.getProperty("user.dir"));
Path guidesRoot = usrDir.resolve("docs/guides");
for (Path srcDir : GuideMojo.getSourceDirs(guidesRoot)) {
Path targetDir = usrDir.resolve("target").resolve("generated-guides").resolve(srcDir.getFileName());
Files.createDirectories(targetDir);
// put here all the entries needed from the parent pom.xml
GuideBuilder builder = new GuideBuilder(srcDir, targetDir, null, properties);
builder.build();
System.out.println("Guides generated to: " + targetDir.getAbsolutePath());
System.out.println("Guides generated to: " + targetDir);
}
}
@ -46,7 +48,11 @@ public class DocsBuildDebugUtil {
// parse pom.xml file - avoid adding Maven as a dependency here
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File("pom.xml"));
NodeList propertiesXml = doc.getDocumentElement().getElementsByTagName("properties").item(0).getChildNodes();
NodeList propertiesXml = doc.getDocumentElement().getElementsByTagName("properties");
if (propertiesXml.getLength() == 0)
return properties;
propertiesXml = propertiesXml.item(0).getChildNodes();
for(int i = 0; i < propertiesXml.getLength(); ++i) {
Node item = propertiesXml.item(i);
if (!(item instanceof Element)) {
@ -56,5 +62,4 @@ public class DocsBuildDebugUtil {
}
return properties;
}
}

View File

@ -2,13 +2,14 @@
<#macro guide title summary priority=999 deniedCategories="" includedOptions="" excludedOptions="" preview="" tileVisible="true" previewDiscussionLink="">
:guide-id: ${id}
:guide-parent: ${parent}
:guide-title: ${title}
:guide-summary: ${summary}
:guide-priority: ${priority}
:guide-tile-visible: ${tileVisible}
:version: ${version}
include::../attributes.adoc[]
include::${attributes}
[[${id}]]
= ${title}

View File

@ -1,42 +1,51 @@
package org.keycloak.guides.maven;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Collections;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.stream.Stream;
public class Context {
private File srcDir;
private Options options;
private Features features;
private List<Guide> guides;
private final Options options;
private final Features features;
private final List<Guide> guides;
public Context(File srcDir) throws IOException {
this.srcDir = srcDir;
public Context(Path srcPath) throws IOException {
this.options = new Options();
this.features = new Features();
this.guides = new LinkedList<>();
Path partials = srcPath.resolve("partials");
Map<String, Integer> guidePriorities = loadPinnedGuides(srcPath);
List<Path> guidePaths;
try (Stream<Path> files = Files.walk(srcPath)) {
guidePaths = files
.filter(Files::isRegularFile)
.filter(p -> !p.startsWith(partials))
.filter(p -> p.getFileName().toString().endsWith(".adoc"))
.filter(p -> !p.getFileName().toString().equals("index.adoc"))
.toList();
} catch (IOException e) {
throw new RuntimeException("Failed to load guides from " + srcPath, e);
}
GuideParser parser = new GuideParser();
Map<String, Integer> guidePriorities = loadPinnedGuides(new File(srcDir, "pinned-guides"));
for (File f : srcDir.listFiles((dir, f) -> f.endsWith(".adoc") && !f.equals("index.adoc"))) {
Guide guide = parser.parse(f);
for (Path guidePath : guidePaths) {
Guide guide = parser.parse(srcPath, guidePath);
if (guide != null) {
if (guidePriorities != null) {
Integer priority = guidePriorities.get(guide.getId());
if (priority != null) {
if (guide.getPriority() != Integer.MAX_VALUE) {
throw new RuntimeException("Guide is pinned, but has a priority specified: " + f.getName());
throw new RuntimeException("Guide is pinned, but has a priority specified: " + guidePath.getFileName());
}
guidePriorities.remove(guide.getId());
guide.setPriority(priority);
@ -44,7 +53,7 @@ public class Context {
}
if (!guide.isTileVisible() && guide.getPriority() == Integer.MAX_VALUE) {
throw new RuntimeException("Invisible tiles should be pinned or have an explicit priority: " + f.getName());
throw new RuntimeException("Invisible tiles should be pinned or have an explicit priority: " + guidePath.getFileName());
}
guides.add(guide);
@ -55,7 +64,7 @@ public class Context {
throw new RuntimeException("File 'pinned-guides' contains files that no longer exist or are misspelled: " + guidePriorities.keySet());
}
Collections.sort(guides, (o1, o2) -> {
guides.sort((o1, o2) -> {
if (o1.getPriority() == o2.getPriority()) {
return o1.getTitle().compareTo(o2.getTitle());
} else {
@ -76,22 +85,22 @@ public class Context {
return guides;
}
private Map<String, Integer> loadPinnedGuides(File pinnedGuides) throws IOException {
if (!pinnedGuides.isFile()) {
private Map<String, Integer> loadPinnedGuides(Path src) throws IOException {
Path pinnedGuides = src.resolve("pinned-guides");
if (Files.notExists(pinnedGuides) || Files.isDirectory(pinnedGuides)) {
return null;
}
Map<String, Integer> priorities = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader(pinnedGuides))) {
try (BufferedReader br = Files.newBufferedReader(pinnedGuides)) {
int c = 1;
for (String l = br.readLine(); l != null; l = br.readLine()) {
l = l.trim();
if (!l.isEmpty()) {
priorities.put(l, c);
priorities.put(Guide.toId(l), c);
}
c++;
}
return priorities;
}
}
}

View File

@ -1,48 +1,59 @@
package org.keycloak.guides.maven;
import java.io.IOException;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
import freemarker.template.Configuration;
import freemarker.template.Template;
import freemarker.template.TemplateException;
import freemarker.template.TemplateExceptionHandler;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.Map;
public class FreeMarker {
private File targetDir;
private Map<String, Object> attributes;
private Configuration configuration;
private final Map<String, Object> attributes;
private final Configuration configuration;
public FreeMarker(File srcDir, Map<String, Object> attributes) throws IOException {
public FreeMarker(Path srcDir, Map<String, Object> attributes) throws IOException {
this.attributes = attributes;
configuration = new Configuration(Configuration.VERSION_2_3_31);
configuration.setDirectoryForTemplateLoading(srcDir);
configuration.setDirectoryForTemplateLoading(srcDir.toFile());
configuration.setDefaultEncoding("UTF-8");
configuration.setTemplateExceptionHandler(TemplateExceptionHandler.RETHROW_HANDLER);
configuration.setLogTemplateExceptions(false);
}
public void template(String template, File targetDir) throws IOException, TemplateException {
Template t = configuration.getTemplate(template);
File out = targetDir.toPath().resolve(template).toFile();
public void template(Path template, Path target) throws IOException, TemplateException {
// We cannot use Path directly for the templateName as \ will be used on Windows
String templateName = StreamSupport.stream(template.spliterator(), false)
.map(p -> p.getFileName().toString())
.collect(Collectors.joining("/"));
File parent = out.getParentFile();
if (!parent.isDirectory()) {
parent.mkdir();
}
Template t = configuration.getTemplate(templateName);
Path out = target.resolve(template);
Path parent = out.getParent();
Files.createDirectories(parent);
HashMap<String, Object> attrs = new HashMap<>(attributes);
attrs.put("id", template.split("/")[1].replace(".adoc", ""));
attrs.put("id", id(template));
attrs.put("attributes", "../".repeat(template.getNameCount() - 1) + "attributes.adoc[]");
attrs.put("parent", template.getNameCount() > 2 ? template.getName(1).toString() : "");
Writer w = new FileWriter(out, StandardCharsets.UTF_8);
t.process(attrs, w);
try(Writer w = Files.newBufferedWriter(out, StandardCharsets.UTF_8)) {
t.process(attrs, w);
}
}
private String id(Path p) {
p = p.getNameCount() > 2 ? p.subpath(1, p.getNameCount()) : p.getName(1);
return Guide.toId(p.toString());
}
}

View File

@ -1,5 +1,7 @@
package org.keycloak.guides.maven;
import java.nio.file.Path;
public class Guide {
private String template;
@ -8,6 +10,12 @@ public class Guide {
private String summary;
private int priority = Integer.MAX_VALUE;
private boolean tileVisible = true;
private Path root;
private Path path;
public static String toId(String path) {
return path.replace("/", "-").replace("\\", "-").replace(".adoc", "");
}
public String getTemplate() {
return template;
@ -56,4 +64,20 @@ public class Guide {
public void setTileVisible(boolean tileVisible) {
this.tileVisible = tileVisible;
}
public Path getRoot() {
return root;
}
public void setRoot(Path root) {
this.root = root;
}
public Path getPath() {
return path;
}
public void setPath(Path path) {
this.path = path;
}
}

View File

@ -1,23 +1,27 @@
package org.keycloak.guides.maven;
import freemarker.template.TemplateException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.stream.Stream;
import org.apache.maven.plugin.logging.Log;
import org.keycloak.common.Version;
import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import freemarker.template.TemplateException;
public class GuideBuilder {
private final FreeMarker freeMarker;
private final File srcDir;
private final File targetDir;
private final Path srcDir;
private final Path targetDir;
private final Log log;
public GuideBuilder(File srcDir, File targetDir, Log log, Properties properties) throws IOException {
public GuideBuilder(Path srcDir, Path targetDir, Log log, Properties properties) throws IOException {
this.srcDir = srcDir;
this.targetDir = targetDir;
this.log = log;
@ -27,32 +31,29 @@ public class GuideBuilder {
globalAttributes.put("version", Version.VERSION);
globalAttributes.put("properties", properties);
this.freeMarker = new FreeMarker(srcDir.getParentFile(), globalAttributes);
this.freeMarker = new FreeMarker(srcDir.getParent(), globalAttributes);
}
public void build() throws TemplateException, IOException {
if (!srcDir.isDirectory()) {
if (!srcDir.mkdir()) {
throw new RuntimeException("Can't create folder " + srcDir);
}
Files.createDirectories(srcDir);
Path partials = srcDir.resolve("partials");
List<Path> templatePaths;
try (Stream<Path> files = Files.walk(srcDir)) {
templatePaths = files
.filter(Files::isRegularFile)
.filter(p -> !p.startsWith(partials))
.filter(p -> p.getFileName().toString().endsWith(".adoc"))
.toList();
} catch (IOException e) {
throw new RuntimeException("Failed to discover templates in " + srcDir, e);
}
for (String t : srcDir.list((dir, name) -> name.endsWith(".adoc"))) {
freeMarker.template(srcDir.getName() + "/" + t, targetDir.getParentFile());
for (Path path : templatePaths) {
Path relativePath = srcDir.getParent().relativize(path);
freeMarker.template(relativePath, targetDir.getParent());
if (log != null) {
log.info("Templated: " + srcDir.getName() + "/" + t);
}
}
File templatesDir = new File(srcDir, "templates");
if (templatesDir.isDirectory()) {
for (String t : templatesDir.list((dir, name) -> name.endsWith(".adoc"))) {
freeMarker.template(srcDir.getName() + "/" + templatesDir.getName() + "/" + t, targetDir.getParentFile());
if (log != null) {
log.info("Templated: " + templatesDir.getName() + "/" + t);
}
log.info("Templated: " + relativePath);
}
}
}
}

View File

@ -1,5 +1,12 @@
package org.keycloak.guides.maven;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Stream;
import org.apache.maven.plugin.AbstractMojo;
import org.apache.maven.plugin.MojoFailureException;
import org.apache.maven.plugin.logging.Log;
@ -8,9 +15,6 @@ import org.apache.maven.plugins.annotations.Mojo;
import org.apache.maven.plugins.annotations.Parameter;
import org.apache.maven.project.MavenProject;
import java.io.File;
import java.nio.file.Files;
@Mojo(name = "keycloak-guide", defaultPhase = LifecyclePhase.GENERATE_SOURCES, threadSafe = true)
public class GuideMojo extends AbstractMojo {
@ -27,25 +31,19 @@ public class GuideMojo extends AbstractMojo {
public void execute() throws MojoFailureException {
try {
Log log = getLog();
File topDir = new File(sourceDir);
Path src = Paths.get(sourceDir);
for (Path srcDir : getSourceDirs(src)) {
String dirName = srcDir.getFileName().toString();
Path targetRoot = Paths.get(targetDir);
Path targetDir = targetRoot.resolve("generated-guides").resolve(dirName);
Files.createDirectories(targetDir);
for (File srcDir : topDir.listFiles(d -> d.isDirectory() && !d.getName().equals("templates"))) {
if (srcDir.getName().equals("target") || srcDir.getName().equals("src")) {
// those are standard maven folders, ignore them
continue;
}
File targetDir = new File(new File(this.targetDir, "generated-guides"), srcDir.getName());
if (!targetDir.isDirectory()) {
targetDir.mkdirs();
}
if (srcDir.getName().equals("images")) {
log.info("Copy files from " + srcDir + " to " + targetDir);
Files.walkFileTree(srcDir.toPath(), new DirectoryCopyVisitor(targetDir.toPath()));
if (dirName.equals("images")) {
log.info("Copy files from " + srcDir + " to " + targetRoot);
Files.walkFileTree(srcDir, new DirectoryCopyVisitor(targetRoot));
} else {
log.info("Guide dir: " + srcDir.getAbsolutePath());
log.info("Target dir: " + targetDir.getAbsolutePath());
log.info("Guide dir: " + srcDir);
log.info("Target dir: " + targetDir);
GuideBuilder g = new GuideBuilder(srcDir, targetDir, log, project.getProperties());
g.build();
@ -57,4 +55,16 @@ public class GuideMojo extends AbstractMojo {
}
}
public static List<Path> getSourceDirs(Path src) throws IOException {
try (Stream<Path> fileList = Files.list(src)) {
return fileList
.filter(Files::isDirectory)
.filter(p ->
switch (p.getFileName().toString()) {
case "src", "target", "templates" -> false;
default -> true;
})
.toList();
}
}
}

View File

@ -1,9 +1,9 @@
package org.keycloak.guides.maven;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
@ -14,26 +14,25 @@ public class GuideParser {
/**
* Parses a FreeMarker template to retrieve Guide attributes
* @param file
* @param guidePath
* @return A Guide instance; or <code>null</code> if not a guide
* @throws IOException
*/
public Guide parse(File file) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
public Guide parse(Path root, Path guidePath) throws IOException {
try (BufferedReader br = Files.newBufferedReader(guidePath)) {
String importName = getImportName(br);
String importElement = getGuideElement(br, importName);
if (importElement != null) {
Guide guide = new Guide();
guide.setTemplate(file.getName());
guide.setId(file.getName().replaceAll(".adoc", ""));
Path relativePath = root.relativize(guidePath);
guide.setId(Guide.toId(relativePath.toString()));
guide.setPath(guidePath);
guide.setRoot(root);
guide.setTemplate(relativePath.toString());
setAttributes(importElement, guide);
return guide;
}
return null;
}
}
@ -83,5 +82,4 @@ public class GuideParser {
}
}
}
}