add dry run e2e tests #2653

neolit123 · 2022-02-09T14:41:18Z

kubeadm is currently missing integration / e2e tests for --dry-run.
this means if we happen to break our dry run support for a particular command (e.g. init) we will not know about it until users report it to us.

xref #2649

kubeadm has integration tests here:
https://github.com/kubernetes/kubernetes/tree/master/cmd/kubeadm/test/cmd
these tests execute a precompiled kubeadm binary to perform some checks and look for exist status 0.

we can use the same method for the init, join, reset tests with --dry-run.
because the dry-run will be reentrant.

but we cannot use this method for upgrade * commands, because the --dry-run for upgrade expects an existing cluster.

kubeconfig files in /etc/kubernetes/...
a running kube-apiserver
other running components...

to have everything in the same place we can add dry-run as part of kinder e2e test workflow:
https://github.com/kubernetes/kubeadm/tree/main/kinder/ci/workflows

the workflow can look like the following:

allocate a kinder cluster with 1 node
call kubeadm init --dry-run on it (add --upload-certs and other special flags, how to test external CA?)
call kubeadm join --dry-run on it (add --control-plane, --certificate-key and other flags?)
call kubeadm reset --dry-run on it
call kubeadm init ... to create an actual k8s node
call kubeadm upgrade apply --dry-run to dry run the "primary node" upgrade of this node
call kubeadm upgrade node --dry-run to dry run the "secondary node" upgrade of this node
.. cleanup

tasks:

add initial e2e test job for dry-run
k/kubeadm: Add dry run e2e tests #2673
k/test-infra PR: kubeadm: add dryrun e2e test job test-infra#25997
in k/k, improve kubeadm's join to use a custom dry-run client as noted here
TODO
update e2e test jobs to have join dry-run as noted here
TODO

The text was updated successfully, but these errors were encountered:

SataQiu · 2022-04-12T10:10:36Z

@neolit123 It seems that we cannot run kubeadm join --dry-run without an actual Kubernetes control-plane. 😅

The join phase will try to fetch the cluster-info ConfigMap even though in dry-run mode.

I0412 10:01:39.066015     149 join.go:530] [preflight] Discovering cluster-info
I0412 10:01:39.067243     149 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "127.0.0.1:6443"
I0412 10:01:39.101380     149 round_trippers.go:553] GET https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s  in 10 milliseconds
I0412 10:01:39.104282     149 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 127.0.0.1:6443: connect: connection refused
I0412 10:01:45.026633     149 round_trippers.go:553] GET https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s  in 5 milliseconds
I0412 10:01:45.027445     149 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 127.0.0.1:6443: connect: connection refused

Therefore, we need at least one worker node to complete the dry-run tests.

neolit123 · 2022-04-12T10:37:59Z

Hm, i think we should fix that and use a fake CM or the dry run client...In dryrun other API calls work like that. But if using the dry run client it means we will probably will have to skip validation of the CM as well 🤔

If you prefer we can merge the initial test job without the join test but it seems we have to fix it in k/k eventually.
.
EDIT: Or maybe ... join does need a control plane and it will fail later even if we use fake cluster info?

SataQiu · 2022-04-12T14:57:15Z

Emm... It doesn't look like the kubeadm join --dry-run command is using a fake client at all.
https://github.com/kubernetes/kubernetes/blob/7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/cmd/join.go#L551-L563

https://github.com/kubernetes/kubernetes/blob/7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/discovery/token/token.go#L53

https://github.com/kubernetes/kubernetes/blob/7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/discovery/token/token.go#L204-L209

...

Maybe it requires a major refactor to use a fake client for kubeadm join --dry-run :(

neolit123 · 2022-04-12T16:20:47Z

i think the refactor is doable. we need to rename the "init" dry-run client to be a generic one and use it for "join" as well.
it's probably not that much work, but i haven't looked at all the details.

we can merge the current PR, but keep this issue open until we can do that in k/k after code freeze for 1.24.

neolit123 · 2022-04-19T16:49:22Z

test job is passing https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-dryrun-latest

neolit123 · 2022-04-20T23:03:54Z

@SataQiu looks like the current e2e tests are flaky.

https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-dryrun-latest

the error is:

preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2022-04-20T18:56:52Z" level=fatal msg="connect: connect endpoint 'unix:///var/run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
, error: exit status 1

my guess is that we are running kubeadm ... inside the nodes before the container runtime has started. i can't remember if kinder's kubeadm-init action has a "wait for CRI" of sorts, but likely it does. one option would to add e.g. sleep 10 before the first kubeadm init in task-03-init-dryrun.

EDIT: unclear if sleep is in the node images, possibly yes.

alternatively this could be a weird bug where containerd in the nodes is simply refusing to start for some reason, and despite:

I0420 20:57:18.894578 107 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock

SataQiu · 2022-04-26T10:03:33Z

@SataQiu looks like the current e2e tests are flaky.

https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-dryrun-latest

the error is:

preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2022-04-20T18:56:52Z" level=fatal msg="connect: connect endpoint 'unix:///var/run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
, error: exit status 1

my guess is that we are running kubeadm ... inside the nodes before the container runtime has started. i can't remember if kinder's kubeadm-init action has a "wait for CRI" of sorts, but likely it does. one option would to add e.g. sleep 10 before the first kubeadm init in task-03-init-dryrun.

EDIT: unclear if sleep is in the node images, possibly yes.

alternatively this could be a weird bug where containerd in the nodes is simply refusing to start for some reason, and despite:

I0420 20:57:18.894578 107 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock

It looks like the kubeadm-kinder-dryrun job was deleted by kubernetes/test-infra@c694052 😓

neolit123 · 2022-04-26T10:06:25Z

Oh looks like @RA489 's PR deleted it in the 1.24 updates and i didn't see that..

Can you please send it again.

SataQiu · 2022-04-26T10:19:45Z

Oh looks like @RA489 's PR deleted it in the 1.24 updates and i didn't see that..

Can you please send it again.

Sure！

k8s-triage-robot · 2022-07-25T10:40:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RA489 · 2022-07-25T13:59:23Z

/remove-lifecycle stale

neolit123 · 2022-10-11T06:54:58Z

Yeah, we can skip testing join in this PR but keep the related issue open. Once k/k code freeze is lifted we should make the change in k/k and add the join test. I don't think it's that complicated, but I haven't looked in more detail.

…

On Apr 12, 2022 17:57, "SataQiu" ***@***.***> wrote: Emm... It doesn't look like the kubeadm join --dry-run command is using a fake client at all. https://github.com/kubernetes/kubernetes/blob/ 7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/ cmd/join.go#L551-L563 Maybe it requires a major refactor to use a fake client for kubeadm join --dry-run :( — Reply to this email directly, view it on GitHub <#2653 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACRATAKANSC2YZPXQAO733VEWFNLANCNFSM5N5WID2A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

neolit123 added area/dry-run area/test help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 9, 2022

neolit123 added this to the v1.24 milestone Feb 9, 2022

neolit123 mentioned this issue Feb 14, 2022

Kubeadm init --dry-run doestn't work if using an external ca #2512

Closed

SataQiu mentioned this issue Feb 15, 2022

Automated cherry pick of #108002: kubeadm: fix the bug that 'kubeadm init --dry-run kubernetes/kubernetes#108116

Closed

SataQiu mentioned this issue Mar 27, 2022

Add dry run e2e tests #2673

Merged

neolit123 modified the milestones: v1.24, v1.25 Mar 29, 2022

SataQiu mentioned this issue Apr 19, 2022

kubeadm: add dryrun e2e test job kubernetes/test-infra#25997

Merged

SataQiu mentioned this issue Apr 26, 2022

kinder: ensure the container runtime is started before executing dryrun e2e test #2688

Merged

SataQiu mentioned this issue Apr 26, 2022

kubeadm: add dryrun e2e test job kubernetes/test-infra#26089

Merged

neolit123 mentioned this issue May 3, 2022

kubeadm: replace *clientset.Clientset with clientset.Interface for join phase kubernetes/kubernetes#109751

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2022

neolit123 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2022

neolit123 modified the milestones: v1.25, v1.26 Aug 25, 2022

neolit123 modified the milestones: v1.26, v1.27 Nov 21, 2022

neolit123 modified the milestones: v1.27, v1.28 Apr 17, 2023

neolit123 modified the milestones: v1.28, v1.29 Jul 21, 2023

neolit123 modified the milestones: v1.29, v1.30 Nov 1, 2023

neolit123 modified the milestones: v1.30, v1.31 Apr 5, 2024

neolit123 modified the milestones: v1.31, v1.32 Aug 7, 2024

neolit123 self-assigned this Aug 7, 2024

neolit123 linked a pull request Aug 19, 2024 that will close this issue

kubeadm: refactor the dry-run logic kubernetes/kubernetes#126776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dry run e2e tests #2653

add dry run e2e tests #2653

neolit123 commented Feb 9, 2022 •

edited

Loading

SataQiu commented Apr 12, 2022 •

edited

Loading

neolit123 commented Apr 12, 2022 •

edited

Loading

SataQiu commented Apr 12, 2022 •

edited

Loading

neolit123 commented Apr 12, 2022

neolit123 commented Apr 19, 2022

neolit123 commented Apr 20, 2022 •

edited

Loading

SataQiu commented Apr 26, 2022

neolit123 commented Apr 26, 2022 •

edited

Loading

SataQiu commented Apr 26, 2022

k8s-triage-robot commented Jul 25, 2022

RA489 commented Jul 25, 2022

neolit123 commented Oct 11, 2022 via email

add dry run e2e tests #2653

add dry run e2e tests #2653

Comments

neolit123 commented Feb 9, 2022 • edited Loading

SataQiu commented Apr 12, 2022 • edited Loading

neolit123 commented Apr 12, 2022 • edited Loading

SataQiu commented Apr 12, 2022 • edited Loading

neolit123 commented Apr 12, 2022

neolit123 commented Apr 19, 2022

neolit123 commented Apr 20, 2022 • edited Loading

SataQiu commented Apr 26, 2022

neolit123 commented Apr 26, 2022 • edited Loading

SataQiu commented Apr 26, 2022

k8s-triage-robot commented Jul 25, 2022

RA489 commented Jul 25, 2022

neolit123 commented Oct 11, 2022 via email

neolit123 commented Feb 9, 2022 •

edited

Loading

SataQiu commented Apr 12, 2022 •

edited

Loading

neolit123 commented Apr 12, 2022 •

edited

Loading

SataQiu commented Apr 12, 2022 •

edited

Loading

neolit123 commented Apr 20, 2022 •

edited

Loading

neolit123 commented Apr 26, 2022 •

edited

Loading