-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodeclaims deleted but not getting reason for delete #7083
Comments
What version of Karpenter are you using? What's the k8s version? |
Hi @jigisha620 karpenter version : 1.0.1 Why don't we receive any logs indicating the reason when a NodeClaim is deleted due to node expiration or when a spot instance becomes unavailable? |
Can you share nodePool, nodeClass, nodeClaim and a more complete set of logs? |
We get the same issue; some of the nodeclaims have annotation
|
karpenter version: 1.0.6
|
Is it caused by the new feature |
Just noticed this happening during the first karpenter startup after the v1 upgrade on all clusters that we upgraded. I followed the v1 upgrade procedure, and it looks like all went smoothly. However, during the first startup karpenter taints nodes that are supposed to be disrupted only on 1st day of the month:
Karpenter logs show the normal startup without any errors, then it suddenly decides to taint the nodes:
I'm going to try increase the auto-generated value for the We're running the EKS 1.29 and this happened after karpenter upgrade 0.37.5 -> 1.0.6 Here are the resources: NodePool
NodeClass:
new NodeClaim (I don't have the old terminated nodeClaim definition):
Here is the complete karpenter log: Looks like Karpenter works OK after the upgrade, so in our case, the impact is limited only to the upgrade window: |
Would anyone be able to point to the line in the code that generates these messages?
I'm having difficulties finding these strings in the karpenter source code... Or does it somehow incorrectly set the taint here? |
(Karpenter v1.0.6): Similarly, I'm not happy with the Karpenter logs ... I think... I guess... if the NodeClaims are the resources that contain the "coherent" status and events summarized in a useful way... the NodeClaims need to remain AFTER the node deletion has happened, or be archived somehow. When they're gone, the easily accessible information is also gone! Right now, Karpenter seem to have too many blind spots for production confidence compared to the calmness and "it just works" nature of Cluster-Autoscaler. |
Description
Nodeclaims deleted without and pdbs are not considered . need to know reason why nodeclaims deleted
here is the logs i got for nodeclaim deletion:
{"level":"INFO","time":"2024-09-26T13:34:44.210Z","logger":"controller","message":"annotated nodeclaim","commit":"62a726c","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"marketplace-hgbqt"},"namespace":"","name":"marketplace-hgbqt","reconcileID":"0da1aeb2-a5cf-4557-afa9-b39d7225d9d0","Node":{"name":"ip-10-0-97-41.ap-south-1.compute.internal"},"provider-id":"aws:///ap-south-1c/i-0bcf1f7224f0e82e4","karpenter.sh/nodeclaim-termination-timestamp":"2024-09-26T13:36:44Z"}
{"level":"INFO","time":"2024-09-26T13:34:44.441Z","logger":"controller","message":"tainted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-97-41.ap-south-1.compute.internal"},"namespace":"","name":"ip-10-0-97-41.ap-south-1.compute.internal","reconcileID":"e653dc3c-0f4f-43d8-a04b-77314ba38c88","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"}
then deleted nodeclaim
The text was updated successfully, but these errors were encountered: