Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinder: switch all workflows to containerd #2620

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Dec 8, 2021

Switch all e2e test workflows to use containerd as the CR for the time
being. Dockershim is being removed from k/k and until cri-dockerd
gains maintainer traction we should avoid setting it up our selfs
for the sake of testing Docker as the CR.

Technically the dockershim removal only affects "latest" workflows
but this change will apply it to all k/k branches. Once and if
cri-dockerd is testable we can add a single workflow for Docker
as the CR.

xref #1412 (comment)

Switch all e2e test workflows to use containerd as the CR for the time
being. Dockershim is being removed from k/k and until cri-dockerd
gains maintainer traction we should avoid setting up our selfs
for the sake of testing Docker as the CR.

Technically the dockershim removal only affects "latest" workflows
but this change will apply it to all k/k branches. Once and if
cri-dockerd is testable we can add a single workflow for Docker
as the CR.
@neolit123 neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/test area/kinder Issues to track work in the kinder tool labels Dec 8, 2021
@neolit123 neolit123 added this to the v1.24 milestone Dec 8, 2021
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 8, 2021
@neolit123 neolit123 requested a review from pacoxu December 8, 2021 19:50
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 8, 2021
@neolit123 neolit123 requested review from RA489 and removed request for yagonobre December 8, 2021 19:50
@neolit123
Copy link
Member Author

@RA489 @pacoxu looking for LGTM

@neolit123
Copy link
Member Author

neolit123 commented Dec 8, 2021

kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcd --version
time="19:59:24" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcd --version]"
Error: failed to exec action cluster-info: exit status 1
exit status 1

hm...could be a flake calling --version in the etcd containers
will test locally later...

@neolit123
Copy link
Member Author

seems to not fail locally.
/retest

@fabriziopandini
Copy link
Member

+1 to this change
We can eventually discuss at the next office hours if to drop support from docker entirely

@fabriziopandini
Copy link
Member

/test pull-kubeadm-kinder-upgrade-latest

@neolit123
Copy link
Member Author

neolit123 commented Dec 8, 2021

not reproducible locally

log:

# task-06-cluster-info
kinder do cluster-info --name=kinder-pull --loglevel=debug

time="23:58:40" level=debug msg="Running: [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Label \"io.k8s.sigs.kind.cluster\"}}]"
time="23:58:40" level=debug msg="Reading container list for cluster kinder-pull"
time="23:58:40" level=debug msg="Running: [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster=kinder-pull --format {{.Names}}]"
time="23:58:40" level=debug msg="Adding node kinder-pull-control-plane-3 to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-control-plane-3]"
time="23:58:40" level=debug msg="Adding node kinder-pull-control-plane-1 to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-control-plane-1]"
time="23:58:40" level=debug msg="Adding node kinder-pull-worker-1 to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-worker-1]"
time="23:58:40" level=debug msg="Adding node kinder-pull-lb to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-lb]"
time="23:58:40" level=debug msg="Adding node kinder-pull-worker-2 to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-worker-2]"
time="23:58:40" level=debug msg="Adding node kinder-pull-control-plane-2 to the cluster"
time="23:58:40" level=debug msg="Running: [docker inspect -f {{index .Config.Labels \"io.k8s.sigs.kind.role\"}} kinder-pull-control-plane-2]"
time="23:58:41" level=debug msg="Reading cluster settings..."
time="23:58:41" level=info msg="Running action cluster-info..."

kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -o=wide
time="23:58:41" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -o=wide]"
NAME                          STATUS     ROLES                  AGE     VERSION                              INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                  KERNEL-VERSION     CONTAINER-RUNTIME
kinder-pull-control-plane-1   Ready      control-plane,master   7m23s   v1.24.0-alpha.0.135+8f91d09865b7ad   172.17.0.7    <none>        Ubuntu Eoan Ermine (development branch)   5.8.0-38-generic   containerd://1.3.0-20-g7af311b4
kinder-pull-control-plane-2   Ready      control-plane,master   5m42s   v1.24.0-alpha.0.135+8f91d09865b7ad   172.17.0.4    <none>        Ubuntu Eoan Ermine (development branch)   5.8.0-38-generic   containerd://1.3.0-20-g7af311b4
kinder-pull-control-plane-3   NotReady   control-plane,master   4m26s   v1.24.0-alpha.0.135+8f91d09865b7ad   172.17.0.5    <none>        Ubuntu Eoan Ermine (development branch)   5.8.0-38-generic   containerd://1.3.0-20-g7af311b4
kinder-pull-worker-1          Ready      <none>                 3m34s   v1.24.0-alpha.0.135+8f91d09865b7ad   172.17.0.6    <none>        Ubuntu Eoan Ermine (development branch)   5.8.0-38-generic   containerd://1.3.0-20-g7af311b4
kinder-pull-worker-2          Ready      <none>                 2m54s   v1.24.0-alpha.0.135+8f91d09865b7ad   172.17.0.3    <none>        Ubuntu Eoan Ermine (development branch)   5.8.0-38-generic   containerd://1.3.0-20-g7af311b4

kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods --all-namespaces -o=wide
time="23:58:41" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods --all-namespaces -o=wide]"
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS        AGE     IP            NODE                          NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-qx4pq                               1/1     Running   0               6m28s   192.168.0.3   kinder-pull-control-plane-1   <none>           <none>
kube-system   coredns-64897985d-vrp4q                               1/1     Running   0               6m28s   192.168.0.2   kinder-pull-control-plane-1   <none>           <none>
kube-system   etcd-kinder-pull-control-plane-1                      0/1     Running   0               7m17s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   etcd-kinder-pull-control-plane-2                      0/1     Running   0               64s     172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   etcd-kinder-pull-control-plane-3                      1/1     Running   0               36s     172.17.0.5    kinder-pull-control-plane-3   <none>           <none>
kube-system   kindnet-cmdzt                                         1/1     Running   0               6m28s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   kindnet-dnxz8                                         1/1     Running   0               4m27s   172.17.0.5    kinder-pull-control-plane-3   <none>           <none>
kube-system   kindnet-gx6r4                                         1/1     Running   0               5m43s   172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   kindnet-lp5v9                                         1/1     Running   0               3m35s   172.17.0.6    kinder-pull-worker-1          <none>           <none>
kube-system   kindnet-vhhj2                                         1/1     Running   0               2m55s   172.17.0.3    kinder-pull-worker-2          <none>           <none>
kube-system   kube-apiserver-kinder-pull-control-plane-1            0/1     Running   0               7m17s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   kube-apiserver-kinder-pull-control-plane-2            0/1     Running   0               5m42s   172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   kube-apiserver-kinder-pull-control-plane-3            1/1     Running   0               4m26s   172.17.0.5    kinder-pull-control-plane-3   <none>           <none>
kube-system   kube-controller-manager-kinder-pull-control-plane-1   0/1     Running   1 (5m31s ago)   7m17s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   kube-controller-manager-kinder-pull-control-plane-2   0/1     Running   0               5m42s   172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   kube-controller-manager-kinder-pull-control-plane-3   1/1     Running   0               4m26s   172.17.0.5    kinder-pull-control-plane-3   <none>           <none>
kube-system   kube-proxy-bzlxd                                      1/1     Running   0               3m35s   172.17.0.6    kinder-pull-worker-1          <none>           <none>
kube-system   kube-proxy-czvp5                                      1/1     Running   0               4m27s   172.17.0.5    kinder-pull-control-plane-3   <none>           <none>
kube-system   kube-proxy-hdgsr                                      1/1     Running   0               5m43s   172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   kube-proxy-l8t2w                                      1/1     Running   0               2m55s   172.17.0.3    kinder-pull-worker-2          <none>           <none>
kube-system   kube-proxy-mnghh                                      1/1     Running   0               6m28s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   kube-scheduler-kinder-pull-control-plane-1            0/1     Running   1 (5m31s ago)   7m17s   172.17.0.7    kinder-pull-control-plane-1   <none>           <none>
kube-system   kube-scheduler-kinder-pull-control-plane-2            1/1     Running   0               5m42s   172.17.0.4    kinder-pull-control-plane-2   <none>           <none>
kube-system   kube-scheduler-kinder-pull-control-plane-3            1/1     Running   0               4m25s   172.17.0.5    kinder-pull-control-plane-3   <none>           <none>

kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods --all-namespaces -o=jsonpath={range .items[*]}{"\n"}{.metadata.name}{" << "}{range .spec.containers[*]}{.image}{", "}{end}{end}
time="23:58:42" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods --all-namespaces -o=jsonpath={range .items[*]}{\"\\n\"}{.metadata.name}{\" << \"}{range .spec.containers[*]}{.image}{\", \"}{end}{end}]"

coredns-64897985d-qx4pq << k8s.gcr.io/coredns/coredns:v1.8.6, 
coredns-64897985d-vrp4q << k8s.gcr.io/coredns/coredns:v1.8.6, 
etcd-kinder-pull-control-plane-1 << k8s.gcr.io/etcd:3.5.1-0, 
etcd-kinder-pull-control-plane-2 << k8s.gcr.io/etcd:3.5.1-0, 
etcd-kinder-pull-control-plane-3 << k8s.gcr.io/etcd:3.5.1-0, 
kindnet-cmdzt << kindest/kindnetd:0.5.4, 
kindnet-dnxz8 << kindest/kindnetd:0.5.4, 
kindnet-gx6r4 << kindest/kindnetd:0.5.4, 
kindnet-lp5v9 << kindest/kindnetd:0.5.4, 
kindnet-vhhj2 << kindest/kindnetd:0.5.4, 
kube-apiserver-kinder-pull-control-plane-1 << k8s.gcr.io/kube-apiserver:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-apiserver-kinder-pull-control-plane-2 << k8s.gcr.io/kube-apiserver:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-apiserver-kinder-pull-control-plane-3 << k8s.gcr.io/kube-apiserver:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-controller-manager-kinder-pull-control-plane-1 << k8s.gcr.io/kube-controller-manager:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-controller-manager-kinder-pull-control-plane-2 << k8s.gcr.io/kube-controller-manager:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-controller-manager-kinder-pull-control-plane-3 << k8s.gcr.io/kube-controller-manager:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-proxy-bzlxd << k8s.gcr.io/kube-proxy:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-proxy-czvp5 << k8s.gcr.io/kube-proxy:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-proxy-hdgsr << k8s.gcr.io/kube-proxy:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-proxy-l8t2w << k8s.gcr.io/kube-proxy:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-proxy-mnghh << k8s.gcr.io/kube-proxy:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-scheduler-kinder-pull-control-plane-1 << k8s.gcr.io/kube-scheduler:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-scheduler-kinder-pull-control-plane-2 << k8s.gcr.io/kube-scheduler:v1.24.0-alpha.0.135_8f91d09865b7ad, 
kube-scheduler-kinder-pull-control-plane-3 << k8s.gcr.io/kube-scheduler:v1.24.0-alpha.0.135_8f91d09865b7ad, 

kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcd --version
time="23:58:43" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcd --version]"

kinder-pull-control-plane-1:$ Using etcdctl version: 3.5.1


kinder-pull-control-plane-1:$ kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
time="23:58:45" level=debug msg="Running: [docker exec kinder-pull-control-plane-1 kubectl --kubeconfig=/etc/kubernetes/admin.conf exec -n=kube-system etcd-kinder-pull-control-plane-1 -- etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list]"
9e7030e5be8a2e49, started, kinder-pull-control-plane-2, https://172.17.0.4:2380, https://172.17.0.4:2379, false
d74f91ad2421c706, started, kinder-pull-control-plane-3, https://172.17.0.5:2380, https://172.17.0.5:2379, false
ed42dd1ddfd62d53, started, kinder-pull-control-plane-1, https://172.17.0.7:2380, https://172.17.0.7:2379, false
 completed!

in the failure logs, etcd pods appear to be running after the upgrade, so this might be caused by some weird kubectl exec semantic vs container state on a slower system (i.e. CI is slower) after the upgrade in the workflow...

it might be worth waiting in a loop for ~10 seconds for etcd --version to pass implying the container is ready for exec.

if that also doesn't work this has to be removed from cluster-info:

if c.ExternalEtcd() == nil {
// NB. before v1.13 local etcd is listening on localhost only; after v1.13
// local etcd is listening on localhost and on the advertise address; we are
// using localhost to accommodate both the use cases
etcdArgs := []string{
"--kubeconfig=/etc/kubernetes/admin.conf", "exec", "-n=kube-system", fmt.Sprintf("etcd-%s", c.BootstrapControlPlane().Name()),
"--",
}
// Get the version of etcdctl from the etcd binary
versionArgs := append(etcdArgs, "etcd", "--version")
lines, err := cp1.Command("kubectl", versionArgs...).RunAndCapture()
if err != nil {
return err
}
etcdctlVersion, err := parseEtcdctlVersion(lines)
if err != nil {
return err
}
cp1.Infof("Using etcdctl version: %s\n", etcdctlVersion)
etcdArgs = append(etcdArgs, "etcdctl", "--endpoints=https://127.0.0.1:2379")
// Append version specific etcdctl certificate flags
if err := appendEtcdctlCertArgs(etcdctlVersion, &etcdArgs); err != nil {
return err
}
etcdArgs = append(etcdArgs, "member", "list")
if err := cp1.Command(
"kubectl", etcdArgs...,
).RunWithEcho(); err != nil {
return err
}
} else {
fmt.Println("using external etcd")
}
return nil
}

@neolit123
Copy link
Member Author

/retest

During CI the cluster-info fails with 'etcd --version' exiting with
status 1, but locally this cannot be reproduced.

Retry the command 10 times to try to avoid flakes.
@neolit123 neolit123 force-pushed the 1.24-switch-all-workflows-to-containerd branch from b21774a to 6fe0dde Compare December 8, 2021 22:51
@neolit123
Copy link
Member Author

neolit123 commented Dec 8, 2021

i think this may have been a weird kubekins problem, because i saw some kubekins revert PRs in test-infra.

regardless the cluster-info code now has retries:
https://github.com/kubernetes/kubeadm/pull/2620/files#diff-a1d1aae3a45cc803e2d016143cf9457f08826064f72ab78540706945a8bc2984R70-R84

@neolit123
Copy link
Member Author

self-LGTM-ing since all docker jobs will start failing.....

@neolit123 neolit123 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 8, 2021
@k8s-ci-robot k8s-ci-robot merged commit 26157e6 into kubernetes:main Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kinder Issues to track work in the kinder tool area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants