PDB / node-scale prevention while workflow pods are running #3814

jonathan-fileread · 2024-11-19T23:54:09Z

What would you like added?

I noticed while multiple runnerset workflow pods are running (i.e. a job that requires 6 runners), that if 4 complete, and if node utilization threshold is set to 0.5 (for example) and requests are now at 40% because 4 workflow jobs completed), node scaledowns can potentially scale down jobs that are still in progress. Since ARC / GH is not idempotent, the jobs could suddenly fail. Is it possible to apply a finalizer / PDB to the workflow pods (spun up by container.mode = kubernetes for docker in docker build)

Let me know if there is current feature to allow for graceful termination to prevent node scaledowns from affecting workflow pods.

Note: Feature requests to integrate vendor specific cloud tools (e.g. awscli, gcloud-sdk, azure-cli) will likely be rejected as the Runner image aims to be vendor agnostic.

Why is this needed?

We dont want individual jobs within a GHA matrix build to suddenly fail due to node scaledown

Additional context

Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

jonathan-fileread added community Community contribution enhancement New feature or request needs triage Requires review from the maintainers labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDB / node-scale prevention while workflow pods are running #3814

PDB / node-scale prevention while workflow pods are running #3814

jonathan-fileread commented Nov 19, 2024 •

edited

Loading

PDB / node-scale prevention while workflow pods are running #3814

PDB / node-scale prevention while workflow pods are running #3814

Comments

jonathan-fileread commented Nov 19, 2024 • edited Loading

What would you like added?

Why is this needed?

Additional context

jonathan-fileread commented Nov 19, 2024 •

edited

Loading