-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insulate users from kubeadm API version changes #2769
Comments
I would like to add:
Final thought. |
agreed to all of @fabriziopandini 's points. kubeadm follows the philosophy of a CLI tool (like ssh, ftp, etc) and it cannot anticipate all of the infrastructure related failures. but having a sane / best-effort amount of retries in the CLI tool makes sense.
hopefully scheduled for 1.19. depends a lot on sig-release and partly on sig-arch!
this can be useful, no doubt. like i've mentioned today, interestingly we have not seen major complains about the failures CAPI is seeing. users are applying custom amount of timeout around their cluster creation on custom infrastructure (e.g. "i know what my GCE running cluster needs").
@randomvariable can you expand on this point?
there is a tracking issue for that as well. it will be a long process and the timeline is unclear. |
For #2254 we will likely have some component on the machine call back to an infrastructure API notification service (or back to the management cluster) to provide information about the failure. Providing users with access to the log is one case, but providing a readable output, which may actually be something along the lines of expanding the range of error codes, could update a specific condition on the machine related to the exact kubeadm failure. A controller could then take appropriate remediative action. I agree this is long-term however. |
/area dependency |
Removing this as a proposal, rather seems like a future cleanup /kind cleanup |
WRT:
in 1.19 kubeadm merged a number of fixes and backported them to 1.17, 1.18: kubernetes/kubeadm#2091 /assign @fabriziopandini adding up-to-date comments to the rest of the tasks:
[1] timeline is unclear, we are blocked on the lack of policy for component extractions out of k/k.
fixes above should conform this task.
we did not merge any PRs in 1.19 for MRO as the contributor was busy with other tasks, but the boilerplate is in place.
this is very long term, potentially after [1]
v1beta1 is scheduled for removal in kubeadm 1.20 and my proposal would be to keep us on track for this effort. |
/milestone v0.4.0 |
@neolit123 thanks for the update! |
xref my comment from #3323 (comment): We should stop exposing the kubeadm v1betax types in our KubeadmConfig/KubeadmControlPlane specs, and instead use our own types. This would allow us to separate what users fill in from which kubeadm API version we end up using in our bootstrap data. As @detiber pointed out, we still have to know which version of the kubeadm types to use when generating our kubeadm yaml file and when interacting with the kubeadm-config ConfigMap. |
I know we don't have a label for it, but just for tracking /area node-agent |
@randomvariable: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @fabriziopandini This task is going to be tackled as release blocker for CABPK first. As part of it, we need a plan of action for v1alpha3 as well, where we translate the v1beta1 types currently used to v1beta2 or later when creating newer Kubernetes clusters. |
/lifecycle active |
/kind release-blocking |
@vincepri @CecileRobertMichon I think we can close this one Instead IMO we should bump up the priority for following issues/PRs which are mandatory to get v1alpha4 to work with Kubernetes v1.22: |
@fabriziopandini @CecileRobertMichon @vincepri Will 1.22 support be mandatory for the CAPI v0.4.0 release and does bump up then mean release blocking? (I don't really know when we wanted to release v0.4.0, Kubernetes 1.22 release seems to be mid-August according to https://github.com/justaugustus/enhancements/blob/2887453eac5cbc5fbd31112fd3d0be2be17b456c/keps/sig-release/2572-release-cadence/README.md) |
Yes, with bump up I intended to make the issue listed above as release blocking. |
+1 not blocking releasing v0.4 on the release timeline of k8s 1.22 @fabriziopandini, you are saying that we can cut v1alpha4 even if 1.22 is not out yet, given that we have fixed the known compatibility issues defined above, correct? |
yes, let's do our best to get ready for v1.22 within v0.4, but we should not wait for it (I edited my comment above to make it clear, I hope) |
Just to clarify, if we're not ready for 1.22 we shouldn't block the release, but rather add the support in a patch release |
Also, given that 1.22 isn't ready yet, we're relying on an alpha/beta/rc version which until release could still bring in additional required changes |
I think we are in agreement. |
What about closing this issue? |
/close |
@vincepri: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Goals
Non-Goals/Future Work
User Story
As an operator, I want kubeadm to have better support for Cluster API's use cases to reduce the number of failed machines in my infrastructure.
Detailed Description
In a number of environments, machines can intermittently fail to bootstrap. The most common of these are control plane joins, which lead to temporary changes in etcd and API server availability, mediated by the speed of the underlying infrastructure and the particulars of infrastructure load balancers.
Some ugly hacks have been introduced, notably #2763 to retry kubeadm operations. As a long term solution, Cluster API should be a good kubeadm citizen and make changes to kubeadm to do the appropriate retries to cover the variety of infrastructure providers supported by Cluster API. In addition, the KCP controller re-implements some of the
Contract changes [optional]
Data model changes [optional]
[Describe contract changes between Cluster API controllers, if applicable.]
/kind proposal
The text was updated successfully, but these errors were encountered: