PDB / node-scale prevention while workflow pods are running #3814
Labels
community
Community contribution
enhancement
New feature or request
needs triage
Requires review from the maintainers
What would you like added?
I noticed while multiple runnerset workflow pods are running (i.e. a job that requires 6 runners), that if 4 complete, and if node utilization threshold is set to 0.5 (for example) and requests are now at 40% because 4 workflow jobs completed), node scaledowns can potentially scale down jobs that are still in progress. Since ARC / GH is not idempotent, the jobs could suddenly fail. Is it possible to apply a finalizer / PDB to the workflow pods (spun up by container.mode = kubernetes for docker in docker build)
Let me know if there is current feature to allow for graceful termination to prevent node scaledowns from affecting workflow pods.
Note: Feature requests to integrate vendor specific cloud tools (e.g.
awscli
,gcloud-sdk
,azure-cli
) will likely be rejected as the Runner image aims to be vendor agnostic.Why is this needed?
We dont want individual jobs within a GHA matrix build to suddenly fail due to node scaledown
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: