Before "5ca23e3bf (Changed to use first_kube_control_plane to parse
kubeadm_certificate_key (#11875), 2025-01-14)", kubespray would have
problem adding new control planes when the order of the nodes in kubectl
output and the ansible inventory were not the same.
But the underlying problem is that the operation is fundamentally
something that should be done only once, and recorded for all host in
play.
Since `register` and `sef_fact` when used with `run_once` set the
variable for all the hosts, use it. Also allows to use the variable
directly instead of relying on hostvars to make the task more readable.
Most variables should have a default instead of relying on the default
filter.
(Note that the variable is misnomed, this should be certs and not keys,
but it's not worth breaking compat).
* control-plane: fix first_kube_control_plane delegation with kube_override_hostname
When kube_override_hostname is configured, the node names reported by
`kubectl get nodes` differ from the inventory_hostname known to Ansible.
This causes delegation failures in subsequent tasks since Ansible cannot
resolve the hostname from kubectl output to an inventory host.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
* control-plane: remove fragile first_control_plane selection logic
Current implementation breaks with kube_override_hostname and has
multiple edge cases. Drop until proper kubectl-based node lookup
can be implemented.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
---------
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The validation step is moved to the end to avoid the loss of files that
may lead to verification failure.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Refactor control plane upgrades with reconfiguration support
Adds revised support for:
- The previously removed `--config` argument for `kubeadm upgrade apply`
- Changes to `ClusterConfiguration` as part of the `upgrade-cluster.yml` playbook lifecycle
- kubeadm-config `v1beta4` `UpgradeConfiguration` for the `kubeadm upgrade apply` command: [UpgradeConfiguration v1beta4](https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/#kubeadm-k8s-io-v1beta4-UpgradeConfiguration).
* Add kubeadm upgrade node support
Per discussion:
- Use `kubeadm upgrade node` on secondary control plane upgrades
- Add support for UpgradeConfiguration.node in kubeadm-config.v1beta4
- Remove redundant `allowRCUpgrades` config
- Revert from `block` for first and secondary control plane back to unblocked tasks since they no longer share much code and it's more readable this way
* Add kubelet and kube-proxy reconfiguration to upgrades
* Fix task to use `kubeadm init phase etcd local`
* Rebase with changes from "Adapt checksums and versions to new hashes updater" PR
* Add `imagePullPolicy` and `imagePullSerial` to kubeadm-config v1beta4 `InitConfiguration.nodeRegistration`
* Ensure correct `AuthorizationConfiguration` API version during upgrades
Fixes an issue where the wrong AuthorizationConfiguration API version could be used by kube-apiserver prematurely during upgrades.
The `kubernets/control-plane` role writes configuration for the target version before control plane pods are upgraded.
However, since the `AuthorizationConfiguration` file is reconciled continuously, this leads to a race condition where a new configuration version can be reconciled before kube-apiserver is upgraded to the compatible version.
This solution ensures the correct configuration is available throughout the process by writing each api version to a different file path. Unused file versions are cleaned up post-upgrade for better hygiene.
* Avoid from_json in cleanup task
This adds a new flag with default `kubeadm_config_validate_enabled: true` to use when debugging features and enhancements affected by the `kubeadm config validate command`.
This new flag should be set to `false` only for development and testing scenarios where validation is expected to fail (pre-release Kubernetes versions, etc).
While working with development and test versions of Kubernetes and Kubespray, I found this option very useful.
Adds the ability to configure the Kubernetes API server with a structured authorization configuration file.
Structured AuthorizationConfiguration is a new feature in Kubernetes v1.29+ (GA in v1.32) that configures the API server's authorization modes with a structured configuration file.
AuthorizationConfiguration files offer features not available with the `--authorization-mode` flag, although Kubespray supports both methods and authorization-mode remains the default for now.
Note: Because the `--authorization-config` and `--authorization-mode` flags are mutually exclusive, the `authorization_modes` ansible variable is ignored when `kube_apiserver_use_authorization_config_file` is set to true. The two features cannot be used at the same time.
Docs: https://kubernetes.io/docs/reference/access-authn-authz/authorization/#configuring-the-api-server-using-an-authorization-config-file
Blog + Examples: https://kubernetes.io/blog/2024/04/26/multi-webhook-and-modular-authorization-made-much-easier/
KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3221-structured-authorization-configuration
I tested this all the way back to k8s v1.29 when AuthorizationConfiguration was first introduced as an alpha feature, although v1.29 required some additional workarounds with `kubeadm_patches`, which I included in example comments.
I also included some example comments with CEL expressions that allowed me to configure webhook authorizers without hitting kubeadm 1.29+ issues that block cluster creation and upgrades such as this one: https://github.com/kubernetes/cloud-provider-openstack/issues/2575.
My workaround configures the webhook to ignore requests from kubeadm and system components, which prevents fatal errors from webhooks that are not available yet, and should be authorized by Node or RBAC anyway.
* kubeadm: do not ignore preflight errors blindly
The "ignoring all errors" seems to date back to the inception of the
kubeadm support (it was --skip-preflight-check before).
This can mask real errors and prevent users from seeing them.
Do not ignore any errors by default and make the set of ignored errors
configurable.
* download/kubeadm: remove redundant task
The mode is already set by the previous `copy` task.
* Validate kubeadm configs
This should help to fail early when we have invalid kubeadm configs (from
a kubespray bug or a misconfiguration).
* kubeadm-upgrade: remove unnecessary bool cast
* Convert kubeadm join discovery timeout to v1beta4 config
* CI: Ignore kubeadm:Mem errors on some setup.
Currently there is not much difference between the files, if there are more changes in the future,
please use different files to distinguish them (you can use the kubeadm_config_api_version variable)
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
I added the kubeadm_config_api_version variable in the previous commit,
and remove kubeadm api version condition.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
The fallback_ips tasks are essentially serializing the gathering of one
fact on all the hosts, which can have dramatic performance implications
on large clusters (several minutes).
This is essentially a reversal of 35f248dff0ddb430e2293af98ba73aa5062c89c1
Being able to run without refreshing the cache facts is not worth it.
We keep fallback_ip for now, simply changing the access to a normal
hostvars variable instead of a custom dictionnary.
Testing for group membership with group names makes Kubespray more
tolerant towards the structure of the inventory.
Where 'inventory_hostname in groups["some_group"] would fail if
"some_group" is not defined, '"some_group" in group_names' would not.
Specifying one directory for kubeadm patches is not ideal:
1. It does not allow working with multiples inventories easily
2. No ansible templating of the patch
3. Ansible path searching can sometimes be confusing
Instead, provide the patch directly in a variable, and add some quality
of life to handle components targeting and patch ordering more
explicitly (`target` and `type` which are translated to the kubeadm
scheme which is based on the file name)
kubernetes/control-plane and kubernetes/kubeadm roles both push kubeadm
patches in the same way.
Extract that code and make it a dependency of both.
This is safe because it's only configuration for kubeadm, which only
takes effect when kubeadm is run.
* feat: add user facing variable with default
* feat: remove rolebinding to anonymous users after init and upgrade
* feat: use file discovery for secondary control plane nodes
* feat: use file discovery for nodes
* fix: do not fail if rolebinding does not exist
* docs: add warning about kube_api_anonymous_auth
* style: improve readability of delegate_to parameter
* refactor: rename discovery kubeconfig file
* test: enable new variable in hardening and upgrade test cases
* docs: add option to config parameters
* test: multiple instances and upgrade
* Validate systemd unit files
This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)
* Hack to check systemd version for service files validation
factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.
This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane
* Migrate node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane