When joining a control plane node and "upgrading" the cluster setup (for
example, to update etcd addresses after adding a new etcd) in the same
playbook run, the node can take a bit of time to become ready after
joining.
This triggers a kubeadm preflight check (ControlPlaneNodesReady) in
kubeadm upgrade, which is run directly after the join tasks.
Add a configurable wait for the control plane node to become Ready to
fix this race condition.
Co-authored-by: Max Gautier <mg@max.gautier.name>
We currently **recursively** set the permissions of /etc/ssl/etcd/ssl
(default path) to 700. But this removes group permission from the files
under it, and certain composents (like calio with etcd datastore) rely
on it ; thus, the upgrade of a cluster can fail because the
calico-kube-controller can't access the certs, and thus the etcd.
This works in other case because as far as I can tell, the apiserver
which do access the etcd run as root (the owner of the files, not just
the "group owner")
We also for some reasons do this twice.
Only create the etcd cert directory with the correct permissions once,
not recursively.
Co-authored-by: Max Gautier <mg@max.gautier.name>
The config.json.j2 template was generating invalid JSON when multiple
crio_registry_auth entries were defined, resulting in multiple top-level
"auths" objects being rendered, e.g.:
{
"auths": { "registry1": { "auth": "xxxx" } },
"auths": { "registry2": { "auth": "yyyy" } }
}
This change moves the loop inside the "auths" object so that all registries
are rendered as siblings under a single "auths" key, producing valid JSON:
{
"auths": {
"registry1": { "auth": "xxxx" },
"registry2": { "auth": "yyyy" }
}
}
Co-authored-by: Martin Cahill <martin.cahill@gmail.com>
kubeadm errors out if 'all' is specified with specific checks, so check
that case when we add hardcoded checks.
Add a test to catch regression.
Co-authored-by: Max Gautier <mg@max.gautier.name>
The way to obtain the IP of a particular member is convoluted and depend
on multiple variables. The match is also textual and it's not clear
against what we're matching
It's also broken for etcd member which are not also Kubernetes nodes,
because the "Lookup node IP in kubernetes" task will fail and abort the
play.
Instead, match against 'peerURLs', which does not need new variable, and
use json output.
- Add testcase for etcd removal on external etcd
* Update pre-commit hooks
* CI: Put pre-commit cache under CI_PROJECT_DIR (#11929)
* CI: Put pre-commit cache under CI_PROJECT_DIR
Apparently gitlab-runner can't cache stuff outside of the project
directory.
Put the cache under CI_PROJECT_DIR to make it work (which also means we
need to ignore it from ansible-lint).
Also update the pre-commit image while we're at it.
Link: https://gitlab.com/gitlab-org/gitlab/-/issues/14151
* update ansible-lint pre-commit
Fixes a bug where `kube-apiserver` fails to start if the PodSecurity
configuration file doesn't have the `apiVersion` and `kind` keys.
Signed-off-by: Alejandro Macedo <alex.macedopereira@gmail.com>
Co-authored-by: Alejandro Macedo <alex.macedopereira@gmail.com>
* Refactor control plane upgrades with reconfiguration support
Adds revised support for:
- The previously removed `--config` argument for `kubeadm upgrade apply`
- Changes to `ClusterConfiguration` as part of the `upgrade-cluster.yml` playbook lifecycle
- kubeadm-config `v1beta4` `UpgradeConfiguration` for the `kubeadm upgrade apply` command: [UpgradeConfiguration v1beta4](https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/#kubeadm-k8s-io-v1beta4-UpgradeConfiguration).
* Add kubeadm upgrade node support
Per discussion:
- Use `kubeadm upgrade node` on secondary control plane upgrades
- Add support for UpgradeConfiguration.node in kubeadm-config.v1beta4
- Remove redundant `allowRCUpgrades` config
- Revert from `block` for first and secondary control plane back to unblocked tasks since they no longer share much code and it's more readable this way
* Add kubelet and kube-proxy reconfiguration to upgrades
* Fix task to use `kubeadm init phase etcd local`
* Rebase with changes from "Adapt checksums and versions to new hashes updater" PR
* Add `imagePullPolicy` and `imagePullSerial` to kubeadm-config v1beta4 `InitConfiguration.nodeRegistration`
(cherry picked from commit b551fe083d)
[WARNING][1] kube-controllers/runconfig.go 193: unable to list KubeControllersConfiguration(default) error=connection is unauthorized: kubecontrollersconfigurations.crd.projectcalico.org "default" is forbidden: User "system:serviceaccount:kube-system:calico-kube-controllers" cannot list resource "kubecontrollersconfigurations" in API group "crd.projectcalico.org" at the cluster scope
Co-authored-by: darkobas <marko@datafund.io>
This avoids spurious failure with 'localhost'.
It should also be more correct the inventory contains uncached hosts
which are not in `k8s_cluster` and therefore should not be Kubespray
business.
(We still use hostvars for uncached hosts, because it's easier to select
on 'ansible_default_ipv4' that way and does not change the end result)