Compare commits

...

14 Commits

Author SHA1 Message Date
k8s-infra-cherrypick-robot
03828c9ffa wait for control plane node to become ready after joining (#12922)
When joining a control plane node and "upgrading" the cluster setup (for
example, to update etcd addresses after adding a new etcd) in the same
playbook run, the node can take a bit of time to become ready after
joining.
This triggers a kubeadm preflight check (ControlPlaneNodesReady) in
kubeadm upgrade, which is run directly after the join tasks.

Add a configurable wait for the control plane node to become Ready to
fix this race condition.

Co-authored-by: Max Gautier <mg@max.gautier.name>
2026-01-29 15:39:49 +05:30
k8s-infra-cherrypick-robot
7f915b333b etcd-certs: only change necessary permissions (#12913)
We currently **recursively** set the permissions of /etc/ssl/etcd/ssl
(default path) to 700. But this removes group permission from the files
under it, and certain composents (like calio with etcd datastore) rely
on it ; thus, the upgrade of a cluster can fail because the
calico-kube-controller can't access the certs, and thus the etcd.

This works in other case because as far as I can tell, the apiserver
which do access the etcd run as root (the owner of the files, not just
the "group owner")

We also for some reasons do this twice.

Only create the etcd cert directory with the correct permissions once,
not recursively.

Co-authored-by: Max Gautier <mg@max.gautier.name>
2026-01-27 20:29:51 +05:30
k8s-infra-cherrypick-robot
28af3e80e8 cri-o: fix duplicate top-level "auths" keys in registry config template (#12884)
The config.json.j2 template was generating invalid JSON when multiple
crio_registry_auth entries were defined, resulting in multiple top-level
"auths" objects being rendered, e.g.:

{
  "auths": { "registry1": { "auth": "xxxx" } },
  "auths": { "registry2": { "auth": "yyyy" } }
}

This change moves the loop inside the "auths" object so that all registries
are rendered as siblings under a single "auths" key, producing valid JSON:

{
  "auths": {
    "registry1": { "auth": "xxxx" },
    "registry2": { "auth": "yyyy" }
  }
}

Co-authored-by: Martin Cahill <martin.cahill@gmail.com>
2026-01-20 19:58:52 +05:30
k8s-infra-cherrypick-robot
92d05ad621 Fix ansible-lint config error (#12863)
Co-authored-by: Max Gautier <mg@max.gautier.name>
2026-01-13 20:35:38 +05:30
k8s-infra-cherrypick-robot
b01c407387 Changed to use first_kube_control_plane to parse kubeadm_certificate_key (#12758)
Co-authored-by: Fredrik Liv <fredrik.liv@elastisys.com>
Co-authored-by: nvalembois <nvalembois@live.com>
2025-12-02 07:28:26 -08:00
k8s-infra-cherrypick-robot
17d21676e9 Fix calico etcd mode networkpolicy RBAC (#12753)
Co-authored-by: Chad Swenson <chadswen@gmail.com>
2025-11-28 08:36:21 -08:00
k8s-infra-cherrypick-robot
7a27aef736 [release-2.27] CI: enable unsafe_show_logs == true by default (#12726)
* CI: enable unsafe_show_logs == true by default

* Deduplicate defaults vars (unsafe_show_logs)

---------

Co-authored-by: Max Gautier <mg@max.gautier.name>
2025-11-19 23:57:59 -08:00
k8s-infra-cherrypick-robot
406ea25217 Fix breakage when ignoring all kubeadm preflight errors (#12618)
kubeadm errors out if 'all' is specified with specific checks, so check
that case when we add hardcoded checks.

Add a test to catch regression.

Co-authored-by: Max Gautier <mg@max.gautier.name>
2025-11-17 22:27:37 -08:00
Max Gautier
87597b044d galaxy.yml: up to next patch version (#12697) 2025-11-17 21:27:38 -08:00
Max Gautier
16e3670dd4 Remove etcd member by peerURLs (#12691)
The way to obtain the IP of a particular member is convoluted and depend
on multiple variables. The match is also textual and it's not clear
against what we're matching

It's also broken for etcd member which are not also Kubernetes nodes,
because the "Lookup node IP in kubernetes" task will fail and abort the
play.

Instead, match against 'peerURLs', which does not need new variable, and
use json output.

- Add testcase for etcd removal on external etcd
2025-11-17 02:53:40 -08:00
Max Gautier
c06b669ae6 [release-2.27] Update pre-commit hooks (#12698)
* Update pre-commit hooks

* CI: Put pre-commit cache under CI_PROJECT_DIR (#11929)

* CI: Put pre-commit cache under CI_PROJECT_DIR

Apparently gitlab-runner can't cache stuff outside of the project
directory.

Put the cache under CI_PROJECT_DIR to make it work (which also means we
need to ignore it from ansible-lint).

Also update the pre-commit image while we're at it.

Link: https://gitlab.com/gitlab-org/gitlab/-/issues/14151

* update ansible-lint pre-commit
2025-11-16 01:11:36 -08:00
k8s-infra-cherrypick-robot
f3354ce2c9 calico: update calico-kube-controller manifest (#12481)
Co-authored-by: Cyclinder Kuo <kuocyclinder@gmail.com>
2025-08-28 00:21:10 -07:00
k8s-infra-cherrypick-robot
7cb6b07c44 Fix: Change "empty" definition for PodSecurity Admission configuration (#12476)
Fixes a bug where `kube-apiserver` fails to start if the PodSecurity
configuration file doesn't have the `apiVersion` and `kind` keys.

Signed-off-by: Alejandro Macedo <alex.macedopereira@gmail.com>
Co-authored-by: Alejandro Macedo <alex.macedopereira@gmail.com>
2025-08-26 09:22:10 -07:00
ChengHao Yang
9505e74d6e Fix: pre-commit failing test (#12484)
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
2025-08-26 09:02:11 -07:00
20 changed files with 68 additions and 79 deletions

View File

@@ -1,5 +1,4 @@
---
parseable: true
skip_list:
# see https://docs.ansible.com/ansible-lint/rules/default_rules.html for a list of all default rules
@@ -38,5 +37,6 @@ exclude_paths:
- venv
- .github
- .ansible
- .cache
mock_modules:
- gluster.gluster.gluster_volume

View File

@@ -3,15 +3,16 @@ pre-commit:
stage: test
tags:
- ffci
image: 'ghcr.io/pre-commit-ci/runner-image@sha256:aaf2c7b38b22286f2d381c11673bec571c28f61dd086d11b43a1c9444a813cef'
image: 'ghcr.io/pre-commit-ci/runner-image@sha256:fe01a6ec51b298412990b88627c3973b1146c7304f930f469bafa29ba60bcde9'
variables:
PRE_COMMIT_HOME: /pre-commit-cache
PRE_COMMIT_HOME: ${CI_PROJECT_DIR}/.cache/pre-commit
script:
- pre-commit run --all-files
cache:
key: pre-commit-all
key: pre-commit-2
paths:
- /pre-commit-cache
- ${PRE_COMMIT_HOME}
when: 'always'
needs: []
vagrant-validate:

View File

@@ -1,7 +1,7 @@
---
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
@@ -15,7 +15,7 @@ repos:
- id: trailing-whitespace
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.35.1
rev: v1.37.1
hooks:
- id: yamllint
args: [--strict]
@@ -27,7 +27,7 @@ repos:
exclude: "^.github|(^docs/_sidebar\\.md$)"
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.10.0.1
rev: v0.11.0.1
hooks:
- id: shellcheck
args: ["--severity=error"]
@@ -35,16 +35,17 @@ repos:
files: "\\.sh$"
- repo: https://github.com/ansible/ansible-lint
rev: v24.12.2
rev: v25.11.0
hooks:
- id: ansible-lint
additional_dependencies:
- ansible
- jmespath==1.0.1
- netaddr==1.3.0
- distlib
- repo: https://github.com/golangci/misspell
rev: v0.6.0
rev: v0.7.0
hooks:
- id: misspell
exclude: "OWNERS_ALIASES$"

View File

@@ -2,7 +2,7 @@
namespace: kubernetes_sigs
description: Deploy a production ready Kubernetes cluster
name: kubespray
version: 2.27.1
version: 2.27.2
readme: README.md
authors:
- The Kubespray maintainers (https://kubernetes.slack.com/channels/kubespray)

View File

@@ -30,8 +30,3 @@ override_system_hostname: true
is_fedora_coreos: false
skip_http_proxy_on_os_packages: false
# If this is true, debug information will be displayed but
# may contain some private data, so it is recommended to set it to false
# in the production environment.
unsafe_show_logs: false

View File

@@ -1,16 +1,16 @@
{% if crio_registry_auth is defined and crio_registry_auth|length %}
{
{% for reg in crio_registry_auth %}
"auths": {
{% for reg in crio_registry_auth %}
"{{ reg.registry }}": {
"auth": "{{ (reg.username + ':' + reg.password) | string | b64encode }}"
}
{% if not loop.last %}
},
},
{% else %}
}
}
{% endif %}
{% endfor %}
}
}
{% else %}
{}

View File

@@ -18,7 +18,6 @@ etcd_backup_retention_count: -1
force_etcd_cert_refresh: true
etcd_config_dir: /etc/ssl/etcd
etcd_cert_dir: "{{ etcd_config_dir }}/ssl"
etcd_cert_dir_mode: "0700"
etcd_cert_group: root
# Note: This does not set up DNS entries. It simply adds the following DNS
# entries to the certificate
@@ -114,11 +113,6 @@ etcd_retries: 4
# https://groups.google.com/a/kubernetes.io/g/dev/c/B7gJs88XtQc/m/rSgNOzV2BwAJ?utm_medium=email&utm_source=footer
etcd_experimental_initial_corrupt_check: true
# If this is true, debug information will be displayed but
# may contain some private data, so it is recommended to set it to false
# in the production environment.
unsafe_show_logs: false
# Enable distributed tracing
# https://etcd.io/docs/v3.5/op-guide/monitoring/#distributed-tracing
etcd_experimental_enable_distributed_tracing: false

View File

@@ -5,8 +5,7 @@
group: "{{ etcd_cert_group }}"
state: directory
owner: "{{ etcd_owner }}"
mode: "{{ etcd_cert_dir_mode }}"
recurse: true
mode: "0700"
- name: "Gen_certs | create etcd script dir (on {{ groups['etcd'][0] }})"
file:
@@ -145,15 +144,6 @@
- ('k8s_cluster' in group_names) and
sync_certs | default(false) and inventory_hostname not in groups['etcd']
- name: Gen_certs | check certificate permissions
file:
path: "{{ etcd_cert_dir }}"
group: "{{ etcd_cert_group }}"
state: directory
owner: "{{ etcd_owner }}"
mode: "{{ etcd_cert_dir_mode }}"
recurse: true
# This is a hack around the fact kubeadm expect the same certs path on all kube_control_plane
# TODO: fix certs generation to have the same file everywhere
# OR work with kubeadm on node-specific config

View File

@@ -27,11 +27,6 @@ vsphere_csi_aggressive_node_not_ready_timeout: 300
vsphere_csi_node_affinity: {}
# If this is true, debug information will be displayed but
# may contain some private data, so it is recommended to set it to false
# in the production environment.
unsafe_show_logs: false
# https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/docs/book/features/volume_snapshot.md#how-to-enable-volume-snapshot--restore-feature-in-vsphere-csi-
# according to the above link , we can controler the block-volume-snapshot parameter
vsphere_csi_block_volume_snapshot: false

View File

@@ -30,6 +30,8 @@ spec:
operator: Exists
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
{% if policy_controller_extra_tolerations is defined %}
{{ policy_controller_extra_tolerations | list | to_nice_yaml(indent=2) | indent(8) }}
{% endif %}
@@ -59,6 +61,8 @@ spec:
- /usr/bin/check-status
- -r
periodSeconds: 10
securityContext:
runAsNonRoot: true
env:
- name: LOG_LEVEL
value: {{ calico_policy_controller_log_level }}
@@ -68,6 +72,8 @@ spec:
- name: DATASTORE_TYPE
value: kubernetes
{% else %}
- name: ENABLED_CONTROLLERS
value: policy,namespace,serviceaccount,workloadendpoint,node
- name: ETCD_ENDPOINTS
value: "{{ etcd_access_addresses }}"
- name: ETCD_CA_CERT_FILE

View File

@@ -6,27 +6,21 @@ metadata:
namespace: kube-system
rules:
{% if calico_datastore == "etcd" %}
- apiGroups:
- ""
- extensions
# Pods are monitored for changing labels.
# The node controller monitors Kubernetes nodes.
# Namespace and serviceaccount labels are used for policy.
- apiGroups: [""]
resources:
- pods
- namespaces
- networkpolicies
- nodes
- namespaces
- serviceaccounts
verbs:
- watch
- list
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- networking.k8s.io
# Watch for changes to Kubernetes NetworkPolicies.
- apiGroups: ["networking.k8s.io"]
resources:
- networkpolicies
verbs:
@@ -67,6 +61,7 @@ rules:
- blockaffinities
- ipamblocks
- ipamhandles
- tiers
verbs:
- get
- list

View File

@@ -2,6 +2,9 @@
# disable upgrade cluster
upgrade_cluster_setup: false
# Number of retries (with 5 seconds interval) to check that new control plane nodes
# are in Ready condition after joining
control_plane_node_become_ready_tries: 24
# By default the external API listens on all interfaces, this can be changed to
# listen on a specific address/interface.
# NOTE: If you specific address/interface and use loadbalancer_apiserver_localhost

View File

@@ -24,11 +24,11 @@
- name: Parse certificate key if not set
set_fact:
kubeadm_certificate_key: "{{ hostvars[groups['kube_control_plane'][0]]['kubeadm_upload_cert'].stdout_lines[-1] | trim }}"
kubeadm_certificate_key: "{{ hostvars[first_kube_control_plane]['kubeadm_upload_cert'].stdout_lines[-1] | trim }}"
run_once: true
when:
- hostvars[groups['kube_control_plane'][0]]['kubeadm_upload_cert'] is defined
- hostvars[groups['kube_control_plane'][0]]['kubeadm_upload_cert'] is not skipped
- hostvars[first_kube_control_plane]['kubeadm_upload_cert'] is defined
- hostvars[first_kube_control_plane]['kubeadm_upload_cert'] is not skipped
- name: Create kubeadm ControlPlane config
template:
@@ -99,3 +99,18 @@
when:
- inventory_hostname != first_kube_control_plane
- kubeadm_already_run is not defined or not kubeadm_already_run.stat.exists
- name: Wait for new control plane nodes to be Ready
when: kubeadm_already_run.stat.exists
run_once: true
command: >
{{ kubectl }} get nodes --selector node-role.kubernetes.io/control-plane
-o jsonpath-as-json="{.items[*].status.conditions[?(@.type == 'Ready')]}"
register: control_plane_node_ready_conditions
retries: "{{ control_plane_node_become_ready_tries }}"
delay: 5
delegate_to: "{{ groups['kube_control_plane'][0] }}"
until: >
control_plane_node_ready_conditions.stdout
| from_json | selectattr('status', '==', 'True')
| length == (groups['kube_control_plane'] | length)

View File

@@ -1,6 +1,6 @@
{% if kube_pod_security_use_default %}
apiVersion: pod-security.admission.config.k8s.io/v1
kind: PodSecurityConfiguration
{% if kube_pod_security_use_default %}
defaults:
enforce: "{{ kube_pod_security_default_enforce }}"
enforce-version: "{{ kube_pod_security_default_enforce_version }}"

View File

@@ -86,13 +86,13 @@
- not kubelet_conf.stat.exists
vars:
ignored:
- DirAvailable--etc-kubernetes-manifests
- "{{ 'DirAvailable--etc-kubernetes-manifests' if 'all' not in kubeadm_ignore_preflight_errors }}"
- "{{ kubeadm_ignore_preflight_errors }}"
command: >-
timeout -k {{ kubeadm_join_timeout }} {{ kubeadm_join_timeout }}
{{ bin_dir }}/kubeadm join
--config {{ kube_config_dir }}/kubeadm-client.conf
--ignore-preflight-errors={{ ignored | flatten | join(',') }}
--ignore-preflight-errors={{ ignored | select | flatten | join(',') }}
--skip-phases={{ kubeadm_join_phases_skip | join(',') }}
- name: Update server field in kubelet kubeconfig

View File

@@ -5,7 +5,9 @@ download_cache_dir: /tmp/kubespray_cache
# If this is true, debug information will be displayed but
# may contain some private data, so it is recommended to set it to false
# in the production environment.
unsafe_show_logs: false
# false by default, unless we're running in CI. (CI_PROJECT_URL should be globally unique even if kubespray happens to run
# in gitlab-ci in other contexts
unsafe_show_logs: "{{ lookup('env', 'CI_PROJECT_URL') == 'https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray' }}"
# do not delete remote cache files after using them
# NOTE: Setting this parameter to TRUE is only really useful when developing kubespray

View File

@@ -1,16 +1,4 @@
---
- name: Lookup node IP in kubernetes
command: >
{{ kubectl }} get nodes {{ node }}
-o jsonpath-as-json='{.status.addresses[?(@.type=="InternalIP")].address}'
register: k8s_node_ips
changed_when: false
when:
- groups['kube_control_plane'] | length > 0
- ip is not defined
- access_ip is not defined
delegate_to: "{{ groups['kube_control_plane'] | first }}"
- name: Remove etcd member from cluster
environment:
ETCDCTL_API: "3"
@@ -21,20 +9,18 @@
delegate_to: "{{ groups['etcd'] | first }}"
block:
- name: Lookup members infos
command: "{{ bin_dir }}/etcdctl member list"
command: "{{ bin_dir }}/etcdctl member list -w json"
register: etcd_members
changed_when: false
check_mode: false
tags:
- facts
- name: Remove member from cluster
vars:
node_ip: "{{ ip if ip is defined else (access_ip if access_ip is defined else (k8s_node_ips.stdout | from_json)[0]) }}"
command:
argv:
- "{{ bin_dir }}/etcdctl"
- member
- remove
- "{{ ((etcd_members.stdout_lines | select('contains', '//' + node_ip + ':'))[0] | split(','))[0] }}"
- "{{ '%x' | format(((etcd_members.stdout | from_json).members | selectattr('peerURLs.0', '==', etcd_peer_url))[0].ID) }}"
register: etcd_removal_output
changed_when: "'Removed member' in etcd_removal_output.stdout"

View File

@@ -9,3 +9,7 @@ etcd_deployment_type: kubeadm
kubeadm_certificate_key: 3998c58db6497dd17d909394e62d515368c06ec617710d02edea31c06d741085
skip_non_kubeadm_warning: true
kube_asymmetric_encryption_algorithm: "RSA-4096"
# This test the variable usage, it is not a prerequisite of the test itself
kubeadm_ignore_preflight_errors:
- all

View File

@@ -0,0 +1,2 @@
REMOVE_NODE_CHECK=true
REMOVE_NODE_NAME=etcd[2]

View File

@@ -130,7 +130,7 @@ run_playbook tests/testcases/100_check-k8s-conformance.yml
# Test node removal procedure
if [ "${REMOVE_NODE_CHECK}" = "true" ]; then
run_playbook remove-node.yml -e skip_confirmation=yes -e node=${REMOVE_NODE_NAME}
run_playbook remove-node.yml -e skip_confirmation=yes -e node="${REMOVE_NODE_NAME}"
fi
# Clean up at the end, this is to allow stage1 tests to include cleanup test