Keep model in VRAM? #516

ScissorSnips · 2022-11-12T04:11:23Z

ScissorSnips
Nov 12, 2022

I'm calling whisper from powershell, with --device cuda
It takes me something like 18 seconds to load the Large model into VRAM
Is there a way the model can stay loaded?
Or is that just how it is, when you call something from the command line/powershell.

Would I be better off writing a python app? Can it load it once then sit in memory until getting called externally to do the transcription?

mathdons · 2022-11-12T06:18:21Z

mathdons
Nov 12, 2022

In a python app, you have more control since you explicitely load the model. For instance I use a little flask service with a simple API that keeps the model loaded in between calls.

LoadedModel = ""
DeviceMode = ""

@app.route('/loadModel/<requestModel>/<useGpu>')
def loadModel(requestModel, useGpu):
    global LoadedModel
    global Model
    global DeviceMode

    device = 'cuda'
    if useGpu == 'False': device = 'cpu'

    if LoadedModel != requestModel or DeviceMode != device:
        LoadedModel = requestModel
        DeviceMode = device
        Model = whisper.load_model(LoadedModel, DeviceMode)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep model in VRAM? #516

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Keep model in VRAM? #516

ScissorSnips Nov 12, 2022

Replies: 1 comment

mathdons Nov 12, 2022

ScissorSnips
Nov 12, 2022

mathdons
Nov 12, 2022