Keep model in VRAM? #516
ScissorSnips
started this conversation in
General
Replies: 1 comment
-
In a python app, you have more control since you explicitely load the model. For instance I use a little flask service with a simple API that keeps the model loaded in between calls.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm calling whisper from powershell, with --device cuda
It takes me something like 18 seconds to load the Large model into VRAM
Is there a way the model can stay loaded?
Or is that just how it is, when you call something from the command line/powershell.
Would I be better off writing a python app? Can it load it once then sit in memory until getting called externally to do the transcription?
Beta Was this translation helpful? Give feedback.
All reactions