Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate users away from CRI socket paths that don't have URL scheme #2426

Closed
4 tasks done
neolit123 opened this issue Mar 31, 2021 · 17 comments
Closed
4 tasks done

migrate users away from CRI socket paths that don't have URL scheme #2426

neolit123 opened this issue Mar 31, 2021 · 17 comments
Labels
area/ecosystem kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@neolit123
Copy link
Member

neolit123 commented Mar 31, 2021

summary of the problem:

  • the kubeadm default socket paths on Linux don't have the unix:// prefix.
  • kubeadm socket detection checks files on disk and does not dial a socket and does not prepend unix://
  • the kubelet has long deprecated paths without unix:// and it might stop supporting them in the future

what we should do:

  • we should tell the user that paths without unix:// are deprecated and will cause an error in 3 or 4 releases (GA)
  • for new kubeadm clusters we should start showing a warning if the user doesn't have unix:// in the path.
  • during kubeadm upgrade, we should iterate all nodes in the cluster and patch "kubeadm.alpha.kubernetes.io/cri-socket"
  • in X releases turn the warning into an error.

1.24 action items:

1.25 action items:

1.26

future release:

  • turn warnings into errors if the kubelet does that in the same release?
@neolit123 neolit123 added area/ecosystem kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. sig/node Categorizes an issue or PR as relevant to SIG Node. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Mar 31, 2021
@neolit123 neolit123 added this to the v1.22 milestone Mar 31, 2021
@pacoxu
Copy link
Member

pacoxu commented Apr 1, 2021

kubernetes/kubernetes#100578 is not handling the upgrade case.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2021
@neolit123 neolit123 modified the milestones: v1.22, v1.23 Jul 5, 2021
@neolit123
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2021
@pacoxu
Copy link
Member

pacoxu commented Aug 5, 2021

Testing 1.22, and I find kubeadm.alpha.kubernetes.io/cri-socket is not in my 1.22.0-beta.0 node.

[root@daocloud ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: failed to get node registration: node daocloud doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
To see the stack trace of this error execute with --v=5 or higher

I got the error when I upgrade from v1.22.0-beta.1 to v1.22.0 with kubelet 1.21.1.

I will look into the problem here.(I am not familiar with the history of this annotation and need some digging)[One of my cluster node was wrongly removed by kubectl delete node xxx and I restart the kubelet to recovery from that, but no label/annotation for this node, that's the reason so we no ignore the error here or do some enhances to auto-detect it.]

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2021
@neolit123 neolit123 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021
@neolit123 neolit123 modified the milestones: v1.23, v1.24 Nov 23, 2021
@neolit123
Copy link
Member Author

neolit123 commented Dec 21, 2021

i can try tacking the upgrade problem for 1.24 as part of the dockershim refactors. #2626

it might be a good idea to do this soon given the kubelet endpoint flag is no longer "experimental".

@neolit123
Copy link
Member Author

updated PR is here:
kubernetes/kubernetes#107295

@neolit123 neolit123 added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 3, 2022
@neolit123 neolit123 changed the title linux: migrate users away from CRI socket paths that don't have unix:// migrate users away from CRI socket paths that don't have URL scheme Jan 11, 2022
@neolit123 neolit123 modified the milestones: v1.24, v1.25 Jan 11, 2022
@neolit123 neolit123 removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 13, 2022
@pacoxu
Copy link
Member

pacoxu commented Aug 24, 2022

Is it too early in v1.26 for users to turn warnings into errors?
Either v1.26 or v1.27 is OK for me. I prefer v1.27 or later.

@neolit123
Copy link
Member Author

1.27 sounds better but we might want to do it only after the kubelet starts doing it, if that ever happens.

@neolit123 neolit123 modified the milestones: v1.26, v1.27 Aug 25, 2022
@Foritus
Copy link

Foritus commented Sep 17, 2022

As a note for people (like me) who ended up at this issue via search engines, if your kubeadm'ed cluster is throwing this error when running kubeadm upgrade node:

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
error execution phase kubelet-config: could not retrieve the node registration options for this node:
  node your-node-goes-here doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation

You can fix this by manually doing:

kubectl edit node <nodename>

and in the annotations section add this:

kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock

Make sure to verify this is the correct socket path (snoop a control plane node to see what they have set, or just use the same one you use for crictl on the command line on that given node).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2022
@neolit123
Copy link
Member Author

turn warnings into errors?

@pacoxu i think we can put that on hold until the kubelet decides to error out.

or perhaps just wait for minumum one more release.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2023
@neolit123 neolit123 removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 15, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2023
@neolit123 neolit123 modified the milestones: v1.27, v1.28, Next Apr 17, 2023
@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2023
@neolit123
Copy link
Member Author

moved to "Next" milestone, lowered priority and frozen

@neolit123 neolit123 removed their assignment Nov 8, 2023
@neolit123
Copy link
Member Author

let's close this for now. in a future release if the kubelet drops support, we can start erroring on the kubeadm side before a kubelet is deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ecosystem kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
6 participants