-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workbench: Tolerations for specific Pods (GPU) #447
Comments
+1 |
We need a way to exclusively use GPU nodes for ONLY GPU resource requests and current configuration doesn't support this. |
Thanks for reporting this! I think you are right that this is less than ideal. If you are trying to set a toleration exclusively on a GPU session, that is something that may be possible by customizing templates. Customizing templates is generally a pretty advanced feature (and can definitely be tedious / annoying across chart versions), but it should be able to get you going here! Can you share an example of a toleration as you would expect it to be defined on the pod that is launched? I should be able to mock up some helm values that can work with that input! |
Sorry for the long delay and thank you for your reply. One Taint we would set on the GPU Node is for example |
Hi everybody, we are currently trying to deploy Workbench on our Kubernetes cluster via Helm. Everything works fine, but we have some hardware GPU nodes, which should be reserved for Workbench GPU Sessions. We do not have any problems starting the GPU Sessions, but we can't get the node "reserved" for these sessions.
We are trying to do this tainting the nodes, but we can't get the toleration exclusively on the GPU sessions. After reading through the chart and other repo issues it seems that it is only possible to set taints for all sessions of a Workbench server. We hoped placement-constraints would help us solving the task, but this isn't working as expected, as it looks at the labels of a node.
Is there any chance to make this work? Are we just missing some documentation or is this totally out of scope?
Thanks in advance for any help or suggestion :)
The text was updated successfully, but these errors were encountered: