Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeclaims deleted but not getting reason for delete #7083

Open
dhanutalari opened this issue Sep 26, 2024 · 9 comments
Open

nodeclaims deleted but not getting reason for delete #7083

dhanutalari opened this issue Sep 26, 2024 · 9 comments
Assignees
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@dhanutalari
Copy link

Description

Nodeclaims deleted without and pdbs are not considered . need to know reason why nodeclaims deleted

here is the logs i got for nodeclaim deletion:
{"level":"INFO","time":"2024-09-26T13:34:44.210Z","logger":"controller","message":"annotated nodeclaim","commit":"62a726c","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"marketplace-hgbqt"},"namespace":"","name":"marketplace-hgbqt","reconcileID":"0da1aeb2-a5cf-4557-afa9-b39d7225d9d0","Node":{"name":"ip-10-0-97-41.ap-south-1.compute.internal"},"provider-id":"aws:///ap-south-1c/i-0bcf1f7224f0e82e4","karpenter.sh/nodeclaim-termination-timestamp":"2024-09-26T13:36:44Z"}

{"level":"INFO","time":"2024-09-26T13:34:44.441Z","logger":"controller","message":"tainted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-97-41.ap-south-1.compute.internal"},"namespace":"","name":"ip-10-0-97-41.ap-south-1.compute.internal","reconcileID":"e653dc3c-0f4f-43d8-a04b-77314ba38c88","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"}

then deleted nodeclaim

@dhanutalari dhanutalari added bug Something isn't working needs-triage Issues that need to be triaged labels Sep 26, 2024
@jigisha620
Copy link
Contributor

What version of Karpenter are you using? What's the k8s version?

@dhanutalari
Copy link
Author

dhanutalari commented Sep 27, 2024

Hi @jigisha620

karpenter version : 1.0.1
k8s version: 1.30

Why don't we receive any logs indicating the reason when a NodeClaim is deleted due to node expiration or when a spot instance becomes unavailable?
How can we determine the reason for a NodeClaim deletion reasons in this case ??

@jigisha620
Copy link
Contributor

Can you share nodePool, nodeClass, nodeClaim and a more complete set of logs?

@jigisha620 jigisha620 self-assigned this Sep 30, 2024
@orz-nil
Copy link

orz-nil commented Oct 30, 2024

We get the same issue; some of the nodeclaims have annotation karpenter.sh/nodeclaim-termination-timestamp, and when reach the termination timestamp, nodes are deleted. node didn't reach the expire time. and it is on-demand instance
here is one of nodeclaim's manifest

apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
  annotations:
    compatibility.karpenter.k8s.aws/cluster-name-tagged: "true"
    compatibility.karpenter.k8s.aws/kubelet-drift-hash: "9225586735335466555"
    compatibility.karpenter.sh/v1beta1-kubelet-conversion: '{"maxPods":110}'
    karpenter.k8s.aws/ec2nodeclass-hash: "17044720421536988815"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodeclaim-termination-timestamp: "2024-10-30T05:29:18Z"
    karpenter.sh/nodepool-hash: "2620790099938737011"
    karpenter.sh/nodepool-hash-version: v3
    karpenter.sh/stored-version-migrated: "true"
  creationTimestamp: "2024-10-30T02:18:26Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-10-30T05:14:18Z"
  finalizers:
  - karpenter.sh/termination
  generateName: airflow-
  generation: 2
  labels:
    k8s.tubi.io/RancherRole: worker
    k8s.tubi.io/airflow: "true"
    k8s.tubi.io/worker_type: airflow
    karpenter.k8s.aws/instance-category: c
    karpenter.k8s.aws/instance-cpu: "16"
    karpenter.k8s.aws/instance-cpu-manufacturer: amd
    karpenter.k8s.aws/instance-ebs-bandwidth: "10000"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: c6a
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "32768"
    karpenter.k8s.aws/instance-network-bandwidth: "6250"
    karpenter.k8s.aws/instance-size: 4xlarge
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: airflow
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: c6a.4xlarge
    topology.k8s.aws/zone-id: usw2-az2
    topology.kubernetes.io/region: us-west-2
    topology.kubernetes.io/zone: us-west-2b
  name: airflow-p2kj5
  ownerReferences:
  - apiVersion: karpenter.sh/v1
    blockOwnerDeletion: true
    kind: NodePool
    name: airflow
    uid: 71a8322d-58d2-46fe-a55b-93c156224548
  resourceVersion: "5289051911"
  uid: 240351a5-af56-49be-9194-4bfaba57a7de
spec:
  expireAfter: 168h
  nodeClassRef:
    group: karpenter.k8s.aws
    kind: EC2NodeClass
    name: default
  requirements:
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - c5
    - c6a
    - c6i
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values:
    - 4xlarge
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - airflow
  - key: k8s.tubi.io/airflow
    operator: In
    values:
    - "true"
  - key: k8s.tubi.io/worker_type
    operator: In
    values:
    - airflow
  - key: k8s.tubi.io/RancherRole
    operator: In
    values:
    - worker
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - c5.4xlarge
    - c6a.4xlarge
    - c6i.4xlarge
  resources:
    requests:
      cpu: 4230m
      memory: 7146Mi
      pods: "9"
  taints:
  - effect: NoSchedule
    key: k8s.tubi.io/airflow
    value: "true"
  terminationGracePeriod: 15m0s
status:
  allocatable:
    cpu: 15890m
    ephemeral-storage: 179Gi
    memory: 27381Mi
    pods: "110"
    vpc.amazonaws.com/pod-eni: "54"
  capacity:
    cpu: "16"
    ephemeral-storage: 200Gi
    memory: 30310Mi
    pods: "110"
    vpc.amazonaws.com/pod-eni: "54"
  conditions:
  - lastTransitionTime: "2024-10-30T02:28:29Z"
    message: ""
    reason: ConsistentStateFound
    status: "True"
    type: ConsistentStateFound
  - lastTransitionTime: "2024-10-30T04:23:31Z"
    message: NodePoolDrifted
    reason: NodePoolDrifted
    status: "True"
    type: Drifted
  - lastTransitionTime: "2024-10-30T02:23:12Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-10-30T02:18:28Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-10-30T02:23:12Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-10-30T02:21:58Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-07897e5d530875b75
  lastPodEventTime: "2024-10-30T05:15:31Z"

@orz-nil
Copy link

orz-nil commented Oct 30, 2024

karpenter version: 1.0.6
controller logs

{"level":"INFO","time":"2024-10-30T02:18:28.992Z","logger":"controller","message":"launched nodeclaim","commit":"6174c75","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"airflow-p2kj5"},"namespace":"","name":"airflow-p2kj5","reconcileID":"4151ad84-3afe-491a-bc6a-45d09fe1aff4","provider-id":"aws:///us-west-2b/i-xxxxxxx","instance-type":"c6a.4xlarge","zone":"us-west-2b","capacity-type":"on-demand","allocatable":{"cpu":"15890m","ephemeral-storage":"179Gi","memory":"27381Mi","pods":"110","vpc.amazonaws.com/pod-eni":"54"}}
{"level":"INFO","time":"2024-10-30T02:21:58.316Z","logger":"controller","message":"registered nodeclaim","commit":"6174c75","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"airflow-p2kj5"},"namespace":"","name":"airflow-p2kj5","reconcileID":"fe0a1cbb-d2db-41b9-8fd5-afeff7c38c4d","provider-id":"aws:///us-west-2b/i-xxxxxxx","Node":{"name":"ip-172-18-31-105.us-west-2.compute.internal"}}
{"level":"INFO","time":"2024-10-30T02:23:12.728Z","logger":"controller","message":"initialized nodeclaim","commit":"6174c75","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"airflow-p2kj5"},"namespace":"","name":"airflow-p2kj5","reconcileID":"2984becf-6605-4f02-88cb-1589cfc0acef","provider-id":"aws:///us-west-2b/i-xxxxxxx","Node":{"name":"ip-172-18-31-105.us-west-2.compute.internal"},"allocatable":{"cpu":"15500m","ephemeral-storage":"174592944937","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"27959808Ki","pods":"110"}}
{"level":"INFO","time":"2024-10-30T05:14:18.369Z","logger":"controller","message":"annotated nodeclaim","commit":"6174c75","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"airflow-p2kj5"},"namespace":"","name":"airflow-p2kj5","reconcileID":"c90b4154-a252-4e3a-b88f-4458f0067350","Node":{"name":"ip-172-18-31-105.us-west-2.compute.internal"},"provider-id":"aws:///us-west-2b/i-xxxxxxx","karpenter.sh/nodeclaim-termination-timestamp":"2024-10-30T05:29:18Z"}
{"level":"INFO","time":"2024-10-30T05:30:22.043Z","logger":"controller","message":"deleted nodeclaim","commit":"6174c75","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"airflow-p2kj5"},"namespace":"","name":"airflow-p2kj5","reconcileID":"3243e713-0e24-4b50-9ffc-91c77a550006","Node":{"name":"ip-172-18-31-105.us-west-2.compute.internal"},"provider-id":"aws:///us-west-2b/i-xxxxxxx"}

@orz-nil
Copy link

orz-nil commented Oct 30, 2024

Is it caused by the new feature terminationGracePeriod?

@project-administrator
Copy link

project-administrator commented Oct 30, 2024

Just noticed this happening during the first karpenter startup after the v1 upgrade on all clusters that we upgraded. I followed the v1 upgrade procedure, and it looks like all went smoothly. However, during the first startup karpenter taints nodes that are supposed to be disrupted only on 1st day of the month:

  spec:
    disruption:
      budgets:
      - duration: 720h
        nodes: "0"
        schedule: 0 4 1 * 0
      - duration: 4h
        nodes: "10"
        schedule: 0 0 1 * 0
      consolidateAfter: 0s
      consolidationPolicy: WhenEmptyOrUnderutilized

Karpenter logs show the normal startup without any errors, then it suddenly decides to taint the nodes:

{"level":"INFO","time":"2024-10-30T16:12:54.598Z","logger":"controller","message":"Starting workers","commit":"6174c75","controller":"nodeclaim.tagging","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":1}
{"level":"INFO","time":"2024-10-30T16:12:54.704Z","logger":"controller","message":"tainted node","commit":"6174c75","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-16-15-204.eu-west-1.compute.internal"},"namespace":"","name":"ip-10-16-15-204.us-west-1.compute.internal","reconcileID":"7e76855c-9d0a-48ab-b0a1-2b163ec67e20","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"}

I'm going to try increase the auto-generated value for the consolidateAfter: 0s to several minutes...

We're running the EKS 1.29 and this happened after karpenter upgrade 0.37.5 -> 1.0.6

Here are the resources:

NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    compatibility.karpenter.sh/v1beta1-nodeclass-reference: '{"name":"core-infra"}'
    karpenter.sh/nodepool-hash: "7357037288634444389"
    karpenter.sh/nodepool-hash-version: v3
    karpenter.sh/stored-version-migrated: "true"
    kustomize.toolkit.fluxcd.io/prune: disabled
    kustomize.toolkit.fluxcd.io/reconcile: disabled
  creationTimestamp: "2024-07-03T15:34:50Z"
  generation: 4
  labels:
    partition: core-infra
    tier: Cluster
    type: snowflake
    workload: core-infra
  name: core-infra-snowflake
spec:
  disruption:
    budgets:
    - duration: 720h
      nodes: "0"
      schedule: 0 4 1 * 0
    - duration: 4h
      nodes: "10"
      schedule: 0 0 1 * 0
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "1000"
    memory: 1000Gi
  template:
    metadata:
      labels:
        partition: core-infra
        tier: Cluster
        type: snowflake
        workload: core-infra
    spec:
      expireAfter: 170h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: core-infra
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - a
        - m
        - c
        - r
        - x
        - z
        - i
        - im
        - is
        - hpc
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "2"
      - key: karpenter.k8s.aws/instance-cpu
        operator: In
        values:
        - "4"
        - "8"
        - "16"
        - "24"
        - "32"
        - "48"
        - "64"
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand

NodeClass:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "3007115448686270386"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.sh/stored-version-migrated: "true"
    kustomize.toolkit.fluxcd.io/prune: disabled
    kustomize.toolkit.fluxcd.io/reconcile: disabled
  creationTimestamp: "2024-07-03T15:34:49Z"
  finalizers:
  - karpenter.k8s.aws/termination
  generation: 2
  labels:
    partition: core-infra
    tier: Cluster
    type: any
    workload: core-infra
  name: core-infra
spec:
  amiSelectorTerms:
  - alias: bottlerocket@latest
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      volumeSize: 10Gi
      volumeType: gp3
  - deviceName: /dev/xvdb
    ebs:
      deleteOnTermination: true
      volumeSize: 100Gi
      volumeType: gp3
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  role: ec2-node-eks-worker-role
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: euw1-cluster
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: euw1-cluster
  tags:
    environment: production
    partition: core-infra
    tier: Cluster
    type: any
    workload: core-infra
  userData: |
    # https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#description-of-settings
    [settings.kubernetes]
    # Add some labels
    [settings.kubernetes.node-labels]

new NodeClaim (I don't have the old terminated nodeClaim definition):

apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
  annotations:
    compatibility.karpenter.k8s.aws/cluster-name-tagged: "true"
    compatibility.karpenter.k8s.aws/kubelet-drift-hash: "15379597991425564585"
    karpenter.k8s.aws/ec2nodeclass-hash: "3007115448686270386"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodepool-hash: "7357037288634444389"
    karpenter.sh/nodepool-hash-version: v3
    karpenter.sh/stored-version-migrated: "true"
  creationTimestamp: "2024-10-30T16:13:55Z"
  finalizers:
  - karpenter.sh/termination
  generateName: core-infra-snowflake-
  generation: 1
  labels:
    karpenter.k8s.aws/instance-category: m
    karpenter.k8s.aws/instance-cpu: "4"
    karpenter.k8s.aws/instance-cpu-manufacturer: aws
    karpenter.k8s.aws/instance-ebs-bandwidth: "4750"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
    karpenter.k8s.aws/instance-family: m6g
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "16384"
    karpenter.k8s.aws/instance-network-bandwidth: "1250"
    karpenter.k8s.aws/instance-size: xlarge
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: core-infra-snowflake
    kubernetes.io/arch: arm64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: m6g.xlarge
    partition: core-infra
    tier: Cluster
    topology.k8s.aws/zone-id: euw1-az3
    topology.kubernetes.io/region: eu-west-1
    topology.kubernetes.io/zone: eu-west-1b
    type: snowflake
    workload: core-infra
  name: core-infra-snowflake-877bv
  ownerReferences:
  - apiVersion: karpenter.sh/v1
    blockOwnerDeletion: true
    kind: NodePool
    name: core-infra-snowflake
    uid: bc31cca8-1b3d-4ddc-988f-d385a80cf0c5
  resourceVersion: "121773039"
  uid: 32c43688-bb4a-43ba-a466-6515e37d9566
spec:
  expireAfter: 170h
  nodeClassRef:
    group: karpenter.k8s.aws
    kind: EC2NodeClass
    name: core-infra
  requirements:
  - key: partition
    operator: In
    values:
    - core-infra
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values:
    - a
    - c
    - hpc
    - i
    - im
    - is
    - m
    - r
    - x
    - z
  - key: karpenter.k8s.aws/instance-cpu
    operator: In
    values:
    - "16"
    - "24"
    - "32"
    - "4"
    - "48"
    - "64"
    - "8"
  - key: type
    operator: In
    values:
    - snowflake
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
    - arm64
  - key: karpenter.k8s.aws/instance-generation
    operator: Gt
    values:
    - "2"
  - key: workload
    operator: In
    values:
    - core-infra
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - eu-west-1b
  - key: tier
    operator: In
    values:
    - Cluster
  - key: node.k8s.org/role
    operator: In
    values:
    - core
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - core-infra-snowflake
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - c5.2xlarge
    - c5a.2xlarge
    - c5ad.2xlarge
    - c5n.xlarge
    - c6a.2xlarge
    - c6g.2xlarge
    - c6gd.2xlarge
    - c6gn.2xlarge
    - c6i.2xlarge
    - c7g.2xlarge
    - c7i-flex.2xlarge
    - c7i.2xlarge
    - i3.xlarge
    - i4g.xlarge
    - i4i.xlarge
    - im4gn.xlarge
    - m3.xlarge
    - m4.xlarge
    - m5.xlarge
    - m5a.2xlarge
    - m5a.xlarge
    - m5ad.xlarge
    - m5d.xlarge
    - m5dn.xlarge
    - m5n.xlarge
    - m5zn.xlarge
    - m6a.2xlarge
    - m6a.xlarge
    - m6g.2xlarge
    - m6g.xlarge
    - m6gd.xlarge
    - m6i.xlarge
    - m6id.xlarge
    - m6idn.xlarge
    - m6in.xlarge
    - m7a.xlarge
    - m7g.2xlarge
    - m7g.xlarge
    - m7gd.xlarge
    - m7i-flex.xlarge
    - m7i.xlarge
    - r3.xlarge
    - r4.xlarge
    - r5.xlarge
    - r5a.xlarge
    - r5ad.xlarge
    - r5b.xlarge
    - r5d.xlarge
    - r5dn.xlarge
    - r5n.xlarge
    - r6a.xlarge
    - r6g.xlarge
    - r6gd.xlarge
    - r6i.xlarge
    - r6id.xlarge
    - r6in.xlarge
    - r7a.xlarge
    - r7g.xlarge
    - r7gd.xlarge
    - r7i.xlarge
  resources:
    requests:
      cpu: 3830m
      ephemeral-storage: 200Mi
      memory: 7193231360500m
      pods: "11"

Here is the complete karpenter log:
karpenter_logs1.json

Looks like Karpenter works OK after the upgrade, so in our case, the impact is limited only to the upgrade window:
karpenter recreates all nodes without paying much attention to PDBs and ignores the disruption budgets during the v1 upgrade.

@project-administrator
Copy link

project-administrator commented Nov 5, 2024

Would anyone be able to point to the line in the code that generates these messages?

"message":"tainted node"
"controller":"node.termination"
"taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"

I'm having difficulties finding these strings in the karpenter source code...

Or does it somehow incorrectly set the taint here?
https://github.com/aws/karpenter-provider-aws/blob/main/pkg/providers/amifamily/bootstrap/bottlerocket.go#L79

@frimik
Copy link

frimik commented Nov 12, 2024

(Karpenter v1.0.6):

Similarly, I'm not happy with the Karpenter logs ... I think... "reason": "drifted" is pretty much the only information you get from that controller, WHAT out of the several drift sources caused the drift? Or am I missing something?

I guess... if the NodeClaims are the resources that contain the "coherent" status and events summarized in a useful way... the NodeClaims need to remain AFTER the node deletion has happened, or be archived somehow. When they're gone, the easily accessible information is also gone!

Right now, Karpenter seem to have too many blind spots for production confidence compared to the calmness and "it just works" nature of Cluster-Autoscaler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

5 participants