Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter Won't Over-Provision GPUs #7038

Open
tcatling opened this issue Sep 18, 2024 · 0 comments
Open

Karpenter Won't Over-Provision GPUs #7038

tcatling opened this issue Sep 18, 2024 · 0 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@tcatling
Copy link

Description

(sorry if this isn't AWS-specific - i'm not familiar with the internals but am happy to repost to kubernetes-sigs/karpenter if that would be more useful)

Observed Behavior:

If karpenter is limited to provisioning nodes with a certain number of GPUs (e.g. 2, 4 or 8), it will refuse to create a node for a pod which requests any other number (e.g. 1, 3, 5, 6 or 7).

Note that, when a node already exists, scheduling a pod onto a node with more gpus than required works as expected.

Expected Behavior:

I would expect karpenter to over-provision gpus when necessary, in the same way as cpu and memory.

Reproduction Steps (Please include YAML):

Follow the karpenter 'getting started' guide on EKS and install the nvidia device plugin.

Use the following node pool and class

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    spec:
      startupTaints:
        - key: node.cilium.io/agent-not-ready
          value: "true"
          effect: NoExecute
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["g"]
        - key: karpenter.k8s.aws/instance-generation
          operator: In
          values: ["4", "5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
  namespace: karpenter
spec:
  tags:
    Name: "{{ .Values.clusterName }}-gpu-karpenter"
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-{{ .Values.clusterName }}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery:  "{{ .Values.clusterName }}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "{{ .Values.clusterName }}" # replace with your cluster name
  amiSelectorTerms:
    # ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
    # AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
    # GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
    - id: ami-0af7fb740c9da69b3 # GPU Amazon Linux 2 18/09/2024

Provisioning the following pod will work and cause a 4 gpu node to be created

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 4
        limits:
          nvidia.com/gpu: 4
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

the following will succeed once the previously created pod has finished, if run before the node is cleaned up (because it fits on the pre-existing node with room to spare). If you run this at a time which would require creation of a new node, it will fail:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 3
        limits:
          nvidia.com/gpu: 3
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

the following will fail, because it doesn't fit on the previously provisioned node, and karpenter won't provision a node with 8 gpus

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod3
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
      command: ["nvidia-smi"]
      resources:
        requests:
          nvidia.com/gpu: 5
        limits:
          nvidia.com/gpu: 5
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

with the following error

   Warning  FailedScheduling  18s   karpenter          Failed to schedule pod, incompatible with nodepool "gpu", daemonset overhead={"cpu": │
│ "280m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"}  │
│ and requirements karpenter.k8s.aws/instance-category In [g], karpenter.k8s.aws/instance-generation In [4 5], karpenter.sh/capacity-type In │
│  [on-demand], karpenter.sh/nodepool In [gpu], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which had enoug │
│ h resources and the required offering met the scheduling requirements); incompatible with nodepool "default", daemonset overhead={"cpu":"2 │
│ 80m","memory":"130Mi","pods":"6"}, no instance type satisfied resources {"cpu":"280m","memory":"130Mi","nvidia.com/gpu":"5","pods":"7"} an │
│ d requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.sh/capacity-type │
│  In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type which ha │
│ d enough resources and the required offering met the scheduling requirements)                                                              │
│                                                                                 

Versions:

  • Chart Version: 1.0.1
  • Kubernetes Version (kubectl version):
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.4-eks-a737599
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tcatling tcatling added bug Something isn't working needs-triage Issues that need to be triaged labels Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant