Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: refactor the dry-run logic #126776

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Aug 19, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:

  • When an object CREATE or UPDATE reaches the default dryrun client
    the operation is a NO-OP, which means subsequent GET calls must
    fully emulate the object that exists in the store.
  • There are multiple implmentations of a DryRunGetter interface
    such the one in init_dryrun.go but there are no implementations
    for reset, upgrade, join.
  • There is a specific DryRunGetter that is backed by a real
    client in clientbacked_dryrun.go, but this is used for upgrade
    and does not work in conjuction with a fake client.

This commit does the following changes:

  • Removes all existing dryrun.go implementations.
  • Add a new DryRun implementation in dryrun.go that implements
    3 clients - fake clientset, real clientset, real dynamic client.
  • The DryRun object uses the method chaining pattern.
  • Allows the user opt-in into real clients only if needed, by passing
    a real kubeconfig. By default only constructs a fake client.
  • The default reactor chain for the fake client, always logs the
    object action, then for GET or LIST actions attempts to use the
    real dynamic client to get the object. If a real object does not
    exist it attempts to get the object from the fake object store.
  • The user can prepend or append reactors to the chain.
  • All known needed reactors for operations during init, join,
    reset, upgrade are added as methods of the DryRun struct.
  • Adds detailed unit test for the DryRun struct and its methods
    including reactors.

Additional changes:

  • Use the new DryRun implementation in all command workflows -
    init, join, reset, upgrade.
  • Ensure that --dry-run works even if there is no active cluster
    by returning faked objects. For join, a faked cluster-info
    with a fake bootstrap token and CA are used.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2653
Fixes kubernetes/kubeadm#1932

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

kubeadm: increased the verbosity of API client dry-run actions during the subcommands "init", "join", "upgrade" and "reset". Allowed dry-run on 'kubeadm join' even if there is no existing cluster by utilizing a faked, in-memory cluster-info ConfigMap.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 19, 2024
@neolit123
Copy link
Member Author

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2024
@neolit123
Copy link
Member Author

neolit123 commented Aug 19, 2024

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2024
@neolit123 neolit123 force-pushed the 1.31-improve-dry-run-logic branch 4 times, most recently from 174edd6 to 451fa49 Compare August 20, 2024 15:57
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2024
@neolit123
Copy link
Member Author

/retitle kubeadm: refactor the dry-run logic

this is ready for review

@k8s-ci-robot k8s-ci-robot changed the title WIP: kubeadm: refactor the dry-run logic kubeadm: refactor the dry-run logic Sep 6, 2024
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 6, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 10, 2024
@neolit123
Copy link
Member Author

rebased

// - Log the action.
// - If the action is not GET or LIST just use the fake client store to write it, unless
// a user reactor was added with PrependReactor() or AppendReactor().
// - Attempt to GET or LIST using the real dynamic client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.

The above message is copied from the PR's description. It‘s not truth when the real client is instantiated and the given object doesn't exist in the cluster, it will mark the action as handled.

// handleGetAction tries to handle all GET actions with the dynamic client.
func (d *DryRun) handleGetAction(action testing.GetAction) (bool, runtime.Object, error) {
	if d.dynamicClient == nil {
		return false, nil, errors.New("dynamicClient is nil")
	}

	unstructuredObj, err := d.dynamicClient.
		Resource(action.GetResource()).
		Namespace(action.GetNamespace()).
		Get(context.Background(), action.GetName(), metav1.GetOptions{})
	if err != nil {
		return true, nil, err
	}

	newObj, err := d.decodeUnstructuredIntoAPIObject(action, unstructuredObj)
	if err != nil {
		return true, nil, err
	}

	return true, newObj, err
}

According to the implementation, it attempts to GET or LIST using the real dynamic client if it is instantiated, otherwise tries to GET or LIST the object from the fake client store. Is it desired?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't see the caller.

				handled, obj, err := d.handleGetAction(getAction)
				if err != nil {
					fmt.Fprintln(d.writer, "[dryrun] Real object does not exist. "+
						"Attempting to GET from followup reactors or from the fake client tracker")
					return false, nil, nil
				}

Please ingore above comments.

if err != nil {
return nil, errors.Wrapf(err, "unable to get internal Kubernetes Service IP from the given service CIDR (%s)", d.cfg.Networking.ServiceSubnet)
dryRun := apiclient.NewDryRun()
if err := dryRun.WithKubeConfigFile(d.KubeConfigPath()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dry-run mode, d.KubeConfigPath() always returns the kubeconfig path from a kubeadm tmp dir.

I'm not sure whether the created client is able to talk with the real cluster?

Do sub phases under init phase need a real cluster when a user run a sub phase with the --dry-run flag.

Copy link
Member Author

@neolit123 neolit123 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the created client is able to talk with the real cluster?

yes, it's a generated admin.conf, it just resides in the tmp dir.

Do sub phases under init phase need a real cluster when a user run a sub phase with the --dry-run flag.

that's a good Q because there is a problem with this it seems.
the phases can work with dry-run but the admin.conf in the TMP dir doesn't exist.

here is what happens:

sudo kubeadm init phase upload-config all --dry-run
error execution phase upload-config/kubeadm: failed to load kubeconfig: open /etc/kubernetes/tmp/kubeadm-init-dryrun3932008713/admin.conf: no such file or directory

so the CA and admin.conf don't exist for this scoped phase dry-run.
but sudo kubeadm init --dry-run as a whole would work as that tmp dir admin.conf will exist ad KubeConfigPath will return it.

even if the user runs:

sudo kubeadm init phase certs ca
sudo kubeadm init phase kubeconfig admin

sudo kubeadm init phase upload-config all --dry-run
error execution phase upload-config/kubeadm: failed to load kubeconfig: open /etc/kubernetes/tmp/kubeadm-init-dryrun3932008713/admin.conf: no such file or directory

it would fail as the non-dry-run commands generated a CA/kubeconfig in the default location /etc/kubernetes/

but the --dry-run is looking for an admin.conf that exists in the tmp dir.

i think to avoid this problem the logic in init.go can check if the tmp dir file exists and if it doesn't it can fall back to the default /etc/kubernetes/admin.conf

i need to test this idea or find a better one..

Copy link
Member Author

@neolit123 neolit123 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlory so i approached this differently. the problem is that a tmp dir is used for many files not only the admin.conf.

i think the only way to allow running individual phases in dry-run mode is to allow the user to specify the dry-run directory. we already had an env variable for that called KUBEADM_INIT_DRYRUN_DIR. i just added new ones for UPGRADE and JOIN as well. so here is how this can be used.

lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo kubeadm init phase certs ca --dry-run
W0925 15:27:34.028290  147394 constants.go:610] Using dry-run directory /etc/kubernetes/tmp/kubeadm-init-dryrun1123083720. To override it, set the environment variable KUBEADM_INIT_DRYRUN_DIR
[certs] Using existing ca certificate authority

lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo ls /etc/kubernetes/tmp/kubeadm-init-dryrun1123083720
ca.crt  ca.key
lubo@3B69Q73:~/go/src/k8s.io/kubernetes$ sudo KUBEADM_INIT_DRYRUN_DIR=/etc/kubernetes/tmp/kubeadm-init-dryrun1123083720
kubeadm init phase kubeconfig all --dry-run
[kubeconfig] Using kubeconfig folder "/etc/kubernetes/tmp/kubeadm-init-dryrun1123083720"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file

so when not setting the env var a random dir is generted, but if the env var is set it can be used for phase dry-run.

i will push an update in a bit.

return retrieveValidatedConfigInfo(nil, cfg, constants.DiscoveryRetryInterval, timeout)
func RetrieveValidatedConfigInfo(dryRunClient clientset.Interface, cfg *kubeadmapi.Discovery, timeout time.Duration) (*clientcmdapi.Config, error) {
isDryRun := dryRunClient != nil
isTesting := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of isTesting? Can we remove it?

Copy link
Member Author

@neolit123 neolit123 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a left over from when i was testing with different ways to make the changes in token.go. i will remove it.

Copy link
Member Author

@neolit123 neolit123 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it has a purpose. without it the function retrieveValidatedConfigInfo is difficult to unit test.
during tests retrieveValidatedConfigInfo gets isDryRun = false and isTesting = true, that results in no loading of additional kubeconfig files (retrieveValidatedConfigInfo loads kubeconfig files). during tests client is a fake client and during real usage client is a dryRun client. getClusterInfo() is called from retrieveValidatedConfigInfo() which complicates it further.

ideas to improve this are welcome, but i think it might be suitable for a new PR that would refactor / break apart retrieveValidatedConfigInfo. perhaps in multiple functions.

var err error

client := dryRunClient
if dryRunClient == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move the creating client logic to InitCfg() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to fetchInitConfigurationFromJoinConfiguration as that func also needs tlsBootstrapCfg.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2024
@neolit123
Copy link
Member Author

just rebased.
still need to address your comments @carlory

The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:

- When an object CREATE or UPDATE reaches the default dryrun client
the operation is a NO-OP, which means subsequent GET calls must
fully emulate the object that exists in the store.
- There are multiple implmentations of a DryRunGetter interface
such the one in init_dryrun.go but there are no implementations
for reset, upgrade, join.
- There is a specific DryRunGetter that is backed by a real
client in clientbacked_dryrun.go, but this is used for upgrade
and does not work in conjuction with a fake client.

This commit does the following changes:

- Removes all existing *dryrun*.go implementations.
- Add a new DryRun implementation in dryrun.go that implements
3 clients - fake clientset, real clientset, real dynamic client.
- The DryRun object uses the method chaining pattern.
- Allows the user opt-in into real clients only if needed, by passing
a real kubeconfig. By default only constructs a fake client.
- The default reactor chain for the fake client, always logs the
object action, then for GET or LIST actions attempts to use the
real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.
- The user can prepend or append reactors to the chain.
- All known needed reactors for operations during init, join,
reset, upgrade are added as methods of the DryRun struct.
- Adds detailed unit test for the DryRun struct and its methods
including reactors.

Additional changes:
- Use the new DryRun implementation in all command workflows -
init, join, reset, upgrade.
- Ensure that --dry-run works even if there is no active cluster
by returning faked objects. For join, a faked cluster-info
with a fake bootstrap token and CA are used.
@neolit123
Copy link
Member Author

looking for more reviews.

@carlory
Copy link
Member

carlory commented Sep 27, 2024

Thanks for your update. It looks good to me.

/lgtm

Looking another reviewer for the final approval.
/hold

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1249f6da20fa2fbcf47fa5d311ef7a6acd7bf51f

@carlory
Copy link
Member

carlory commented Sep 27, 2024

/cc @SataQiu @pacoxu

@@ -105,7 +105,7 @@ func runCleanupNode(c workflow.RunData) error {
klog.Warningf("[reset] Failed to remove containers: %v\n", err)
}
} else {
fmt.Println("[reset] Would remove Kubernetes-managed containers")
fmt.Println("[dyrrun] Would remove Kubernetes-managed containers")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Println("[dyrrun] Would remove Kubernetes-managed containers")
fmt.Println("[dryrun] Would remove Kubernetes-managed containers")

typo

@@ -176,7 +176,7 @@ func resetConfigDir(configPathDir string, dirsToClean []string, isDryRun bool) {
}
}
} else {
fmt.Printf("[reset] Would delete files: %v\n", filesToClean)
fmt.Printf("[dryrun] Would delete files: %v\n", filesToClean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should file path be quoted using %q?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add dry run e2e tests the dynamic dryrun client in kubeadm only supports the core/v1 GroupVersion
4 participants