kubeadm: refactor the dry-run logic #126776

neolit123 · 2024-08-19T06:48:46Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:

When an object CREATE or UPDATE reaches the default dryrun client
the operation is a NO-OP, which means subsequent GET calls must
fully emulate the object that exists in the store.
There are multiple implmentations of a DryRunGetter interface
such the one in init_dryrun.go but there are no implementations
for reset, upgrade, join.
There is a specific DryRunGetter that is backed by a real
client in clientbacked_dryrun.go, but this is used for upgrade
and does not work in conjuction with a fake client.

This commit does the following changes:

Removes all existing dryrun.go implementations.
Add a new DryRun implementation in dryrun.go that implements
3 clients - fake clientset, real clientset, real dynamic client.
The DryRun object uses the method chaining pattern.
Allows the user opt-in into real clients only if needed, by passing
a real kubeconfig. By default only constructs a fake client.
The default reactor chain for the fake client, always logs the
object action, then for GET or LIST actions attempts to use the
real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.
The user can prepend or append reactors to the chain.
All known needed reactors for operations during init, join,
reset, upgrade are added as methods of the DryRun struct.
Adds detailed unit test for the DryRun struct and its methods
including reactors.

Additional changes:

Use the new DryRun implementation in all command workflows -
init, join, reset, upgrade.
Ensure that --dry-run works even if there is no active cluster
by returning faked objects. For join, a faked cluster-info
with a fake bootstrap token and CA are used.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2653
Fixes kubernetes/kubeadm#1932

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

kubeadm: increased the verbosity of API client dry-run actions during the subcommands "init", "join", "upgrade" and "reset". Allowed dry-run on 'kubeadm join' even if there is no existing cluster by utilizing a faked, in-memory cluster-info ConfigMap.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-08-19T06:49:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [neolit123]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2024-08-19T06:50:41Z

/hold for

kubeadm: implement 'kubeadm upgrade apply phase' #126032

neolit123 · 2024-08-19T11:48:54Z

/triage accepted
/priority backlog

neolit123 · 2024-09-06T12:27:57Z

/retitle kubeadm: refactor the dry-run logic

this is ready for review

neolit123 · 2024-09-10T15:20:45Z

rebased

carlory · 2024-09-11T06:11:51Z

cmd/kubeadm/app/util/apiclient/dryrun.go

+//   - Log the action.
+//   - If the action is not GET or LIST just use the fake client store to write it, unless
+//     a user reactor was added with PrependReactor() or AppendReactor().
+//   - Attempt to GET or LIST using the real dynamic client.


real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.

The above message is copied from the PR's description. It‘s not truth when the real client is instantiated and the given object doesn't exist in the cluster, it will mark the action as handled.

// handleGetAction tries to handle all GET actions with the dynamic client. func (d *DryRun) handleGetAction(action testing.GetAction) (bool, runtime.Object, error) { if d.dynamicClient == nil { return false, nil, errors.New("dynamicClient is nil") } unstructuredObj, err := d.dynamicClient. Resource(action.GetResource()). Namespace(action.GetNamespace()). Get(context.Background(), action.GetName(), metav1.GetOptions{}) if err != nil { return true, nil, err } newObj, err := d.decodeUnstructuredIntoAPIObject(action, unstructuredObj) if err != nil { return true, nil, err } return true, newObj, err }

According to the implementation, it attempts to GET or LIST using the real dynamic client if it is instantiated, otherwise tries to GET or LIST the object from the fake client store. Is it desired?

Sorry, I didn't see the caller.

handled, obj, err := d.handleGetAction(getAction) if err != nil { fmt.Fprintln(d.writer, "[dryrun] Real object does not exist. "+ "Attempting to GET from followup reactors or from the fake client tracker") return false, nil, nil }

Please ingore above comments.

carlory · 2024-09-11T06:39:41Z

cmd/kubeadm/app/cmd/init.go

-	if err != nil {
-		return nil, errors.Wrapf(err, "unable to get internal Kubernetes Service IP from the given service CIDR (%s)", d.cfg.Networking.ServiceSubnet)
+	dryRun := apiclient.NewDryRun()
+	if err := dryRun.WithKubeConfigFile(d.KubeConfigPath()); err != nil {


In dry-run mode, d.KubeConfigPath() always returns the kubeconfig path from a kubeadm tmp dir.

I'm not sure whether the created client is able to talk with the real cluster?

Do sub phases under init phase need a real cluster when a user run a sub phase with the --dry-run flag.

I'm not sure whether the created client is able to talk with the real cluster?

yes, it's a generated admin.conf, it just resides in the tmp dir.

Do sub phases under init phase need a real cluster when a user run a sub phase with the --dry-run flag.

that's a good Q because there is a problem with this it seems.
the phases can work with dry-run but the admin.conf in the TMP dir doesn't exist.

here is what happens:

sudo kubeadm init phase upload-config all --dry-run error execution phase upload-config/kubeadm: failed to load kubeconfig: open /etc/kubernetes/tmp/kubeadm-init-dryrun3932008713/admin.conf: no such file or directory

so the CA and admin.conf don't exist for this scoped phase dry-run.
but sudo kubeadm init --dry-run as a whole would work as that tmp dir admin.conf will exist ad KubeConfigPath will return it.

even if the user runs:

sudo kubeadm init phase certs ca sudo kubeadm init phase kubeconfig admin sudo kubeadm init phase upload-config all --dry-run error execution phase upload-config/kubeadm: failed to load kubeconfig: open /etc/kubernetes/tmp/kubeadm-init-dryrun3932008713/admin.conf: no such file or directory

it would fail as the non-dry-run commands generated a CA/kubeconfig in the default location /etc/kubernetes/

but the --dry-run is looking for an admin.conf that exists in the tmp dir.

i think to avoid this problem the logic in init.go can check if the tmp dir file exists and if it doesn't it can fall back to the default /etc/kubernetes/admin.conf

i need to test this idea or find a better one..

@carlory so i approached this differently. the problem is that a tmp dir is used for many files not only the admin.conf.

i think the only way to allow running individual phases in dry-run mode is to allow the user to specify the dry-run directory. we already had an env variable for that called KUBEADM_INIT_DRYRUN_DIR. i just added new ones for UPGRADE and JOIN as well. so here is how this can be used.

lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo kubeadm init phase certs ca --dry-run W0925 15:27:34.028290 147394 constants.go:610] Using dry-run directory /etc/kubernetes/tmp/kubeadm-init-dryrun1123083720. To override it, set the environment variable KUBEADM_INIT_DRYRUN_DIR [certs] Using existing ca certificate authority lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo ls /etc/kubernetes/tmp/kubeadm-init-dryrun1123083720 ca.crt ca.key lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo KUBEADM_INIT_DRYRUN_DIR=/etc/kubernetes/tmp/kubeadm-init-dryrun1123083720 kubeadm init phase kubeconfig all --dry-run [kubeconfig] Using kubeconfig folder "/etc/kubernetes/tmp/kubeadm-init-dryrun1123083720" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "super-admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file

so when not setting the env var a random dir is generted, but if the env var is set it can be used for phase dry-run.

i will push an update in a bit.

cmd/kubeadm/app/cmd/join.go

carlory · 2024-09-11T06:55:13Z

cmd/kubeadm/app/discovery/token/token.go

-	return retrieveValidatedConfigInfo(nil, cfg, constants.DiscoveryRetryInterval, timeout)
+func RetrieveValidatedConfigInfo(dryRunClient clientset.Interface, cfg *kubeadmapi.Discovery, timeout time.Duration) (*clientcmdapi.Config, error) {
+	isDryRun := dryRunClient != nil
+	isTesting := false


What's the purpose of isTesting? Can we remove it?

it's a left over from when i was testing with different ways to make the changes in token.go. i will remove it.

actually, it has a purpose. without it the function retrieveValidatedConfigInfo is difficult to unit test.
during tests retrieveValidatedConfigInfo gets isDryRun = false and isTesting = true, that results in no loading of additional kubeconfig files (retrieveValidatedConfigInfo loads kubeconfig files). during tests client is a fake client and during real usage client is a dryRun client. getClusterInfo() is called from retrieveValidatedConfigInfo() which complicates it further.

ideas to improve this are welcome, but i think it might be suitable for a new PR that would refactor / break apart retrieveValidatedConfigInfo. perhaps in multiple functions.

cmd/kubeadm/app/discovery/discovery.go

carlory · 2024-09-11T07:24:26Z

cmd/kubeadm/app/cmd/join.go

+	var err error
+
+	client := dryRunClient
+	if dryRunClient == nil {


Is it possible to move the creating client logic to InitCfg() ?

yes, makes sense

moved this to fetchInitConfigurationFromJoinConfiguration as that func also needs tlsBootstrapCfg.

neolit123 · 2024-09-24T13:50:33Z

just rebased.
still need to address your comments @carlory

The current dryrun client implemnetation is suboptimal and sparse. It has the following problems: - When an object CREATE or UPDATE reaches the default dryrun client the operation is a NO-OP, which means subsequent GET calls must fully emulate the object that exists in the store. - There are multiple implmentations of a DryRunGetter interface such the one in init_dryrun.go but there are no implementations for reset, upgrade, join. - There is a specific DryRunGetter that is backed by a real client in clientbacked_dryrun.go, but this is used for upgrade and does not work in conjuction with a fake client. This commit does the following changes: - Removes all existing *dryrun*.go implementations. - Add a new DryRun implementation in dryrun.go that implements 3 clients - fake clientset, real clientset, real dynamic client. - The DryRun object uses the method chaining pattern. - Allows the user opt-in into real clients only if needed, by passing a real kubeconfig. By default only constructs a fake client. - The default reactor chain for the fake client, always logs the object action, then for GET or LIST actions attempts to use the real dynamic client to get the object. If a real object does not exist it attempts to get the object from the fake object store. - The user can prepend or append reactors to the chain. - All known needed reactors for operations during init, join, reset, upgrade are added as methods of the DryRun struct. - Adds detailed unit test for the DryRun struct and its methods including reactors. Additional changes: - Use the new DryRun implementation in all command workflows - init, join, reset, upgrade. - Ensure that --dry-run works even if there is no active cluster by returning faked objects. For join, a faked cluster-info with a fake bootstrap token and CA are used.

neolit123 · 2024-09-27T09:10:07Z

looking for more reviews.

carlory · 2024-09-27T09:55:52Z

Thanks for your update. It looks good to me.

/lgtm

Looking another reviewer for the final approval.
/hold

k8s-ci-robot · 2024-09-27T09:55:59Z

LGTM label has been added.

Git tree hash: 1249f6da20fa2fbcf47fa5d311ef7a6acd7bf51f

carlory · 2024-09-27T09:56:34Z

/cc @SataQiu @pacoxu

pacoxu · 2024-09-29T08:52:24Z

cmd/kubeadm/app/cmd/phases/reset/cleanupnode.go

@@ -105,7 +105,7 @@ func runCleanupNode(c workflow.RunData) error {
 			klog.Warningf("[reset] Failed to remove containers: %v\n", err)
 		}
 	} else {
-		fmt.Println("[reset] Would remove Kubernetes-managed containers")
+		fmt.Println("[dyrrun] Would remove Kubernetes-managed containers")


Suggested change

fmt.Println("[dyrrun] Would remove Kubernetes-managed containers")

fmt.Println("[dryrun] Would remove Kubernetes-managed containers")

typo

pacoxu · 2024-09-29T08:59:16Z

cmd/kubeadm/app/cmd/phases/reset/cleanupnode.go

@@ -176,7 +176,7 @@ func resetConfigDir(configPathDir string, dirsToClean []string, isDryRun bool) {
 			}
 		}
 	} else {
-		fmt.Printf("[reset] Would delete files: %v\n", filesToClean)
+		fmt.Printf("[dryrun] Would delete files: %v\n", filesToClean)


Should file path be quoted using %q?

k8s-ci-robot requested review from carlory and SataQiu August 19, 2024 06:49

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch 4 times, most recently from 174edd6 to 451fa49 Compare August 20, 2024 15:57

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch from 451fa49 to f277228 Compare September 3, 2024 14:33

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch from f277228 to 7dc40c5 Compare September 3, 2024 16:26

k8s-ci-robot changed the title ~~WIP: kubeadm: refactor the dry-run logic~~ kubeadm: refactor the dry-run logic Sep 6, 2024

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 6, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch from 7dc40c5 to 66e222c Compare September 10, 2024 13:28

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 10, 2024

carlory reviewed Sep 11, 2024

View reviewed changes

cmd/kubeadm/app/cmd/join.go Outdated Show resolved Hide resolved

carlory reviewed Sep 11, 2024

View reviewed changes

cmd/kubeadm/app/discovery/discovery.go Outdated Show resolved Hide resolved

carlory reviewed Sep 11, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch from 66e222c to 7539b95 Compare September 24, 2024 13:43

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2024

neolit123 force-pushed the 1.31-improve-dry-run-logic branch from 7539b95 to 5b0a316 Compare September 25, 2024 14:01

k8s-ci-robot assigned carlory Sep 27, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2024

k8s-ci-robot requested a review from pacoxu September 27, 2024 09:56

pacoxu reviewed Sep 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: refactor the dry-run logic #126776

kubeadm: refactor the dry-run logic #126776

neolit123 commented Aug 19, 2024 •

edited

Loading

k8s-ci-robot commented Aug 19, 2024

neolit123 commented Aug 19, 2024

neolit123 commented Aug 19, 2024 •

edited

Loading

neolit123 commented Sep 6, 2024

neolit123 commented Sep 10, 2024

carlory Sep 11, 2024

carlory Sep 11, 2024

carlory Sep 11, 2024

neolit123 Sep 11, 2024 •

edited

Loading

neolit123 Sep 25, 2024 •

edited

Loading

carlory Sep 11, 2024

neolit123 Sep 11, 2024 •

edited

Loading

neolit123 Sep 25, 2024 •

edited

Loading

carlory Sep 11, 2024

neolit123 Sep 11, 2024

neolit123 Sep 25, 2024

neolit123 commented Sep 24, 2024

neolit123 commented Sep 27, 2024

carlory commented Sep 27, 2024

k8s-ci-robot commented Sep 27, 2024

carlory commented Sep 27, 2024

pacoxu Sep 29, 2024

pacoxu Sep 29, 2024

	fmt.Println("[dyrrun] Would remove Kubernetes-managed containers")
	fmt.Println("[dryrun] Would remove Kubernetes-managed containers")

kubeadm: refactor the dry-run logic #126776

Are you sure you want to change the base?

kubeadm: refactor the dry-run logic #126776

Conversation

neolit123 commented Aug 19, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Aug 19, 2024

neolit123 commented Aug 19, 2024

neolit123 commented Aug 19, 2024 • edited Loading

neolit123 commented Sep 6, 2024

neolit123 commented Sep 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

neolit123 Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

neolit123 Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 commented Sep 24, 2024

neolit123 commented Sep 27, 2024

carlory commented Sep 27, 2024

k8s-ci-robot commented Sep 27, 2024

carlory commented Sep 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neolit123 commented Aug 19, 2024 •

edited

Loading

neolit123 commented Aug 19, 2024 •

edited

Loading

neolit123 Sep 11, 2024 •

edited

Loading

neolit123 Sep 25, 2024 •

edited

Loading

neolit123 Sep 11, 2024 •

edited

Loading

neolit123 Sep 25, 2024 •

edited

Loading