A Kubernetes scheduler designed for smart scheduling with llmaz.
vScheduler maintains multiple plugins for llm workloads scheduling.
A llama2-7B
model can be run on 1xA100 GPU, can also be run on 1xA10 GPU, this is what we called fungibility.
With resourceFungibility plugin, we can simply achieve this with at most 8 alternative GPU types.