Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribe on GPU #2329

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Transcribe on GPU #2329

wants to merge 3 commits into from

Conversation

take0x
Copy link

@take0x take0x commented Sep 9, 2024

Idealy, log_mel_spectrogram() should use model.device when transcribing.

Copy link

@RahulVadisetty91 RahulVadisetty91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

55

Consider simplifying the condition to make it more concise.

@ExtReMLapin
Copy link

benchmarks ?

@kittsil
Copy link
Contributor

kittsil commented Sep 21, 2024

I am not sure that this is a valuable change.

While it is not a robust benchmark, I did do an experiment on my local machine.
10x log_mel_spectrogram() on a 177min audio file:

CPU:
  mean: 0.948
  std_dev: 0.208

GPU:
  mean: 2.67
  std_dev: 1.20

Machine specs:

  • CPU: i5-13500HX
  • GPU: GeForce RTX 4050 Laptop GPU

Note: The audio takes ~30s to load and ~330s to transcribe, so the difference of one or two seconds seems largely moot regardless.

@take0x
Copy link
Author

take0x commented Sep 22, 2024

What is important is that the device specified in load_model() should be used when transcribing, rather than the actual benchmark result.

@kittsil
Copy link
Contributor

kittsil commented Sep 22, 2024

@take0x, I was using my GPU to transcribe.

What is important is that the device specified in load_model() should be used when transcribing, rather than the actual benchmark result.

The device specified is used to transcribe. The log_mel_spectrogram() computation, which is a preprocessing step and NOT part of the NN model, defaults to using the CPU.

I think most consumers of the code would say "the fastest device available should be used to create the mel spectrogram." Given the nature of the computation, a CPU's almost always going to be the faster device (and should therefore be the default), despite the device on which the NN (a very different computation) runs.

You're more likely to get a PR approved if it included an optional mel_spectrogram_device parameter that allows that computation to be run on a specific device, but even then... I'm not sure this has much value compared to the noise of adding another parameter.

@take0x
Copy link
Author

take0x commented Sep 22, 2024

@kittsil
Thank you for your advice.

In my case, when transcribing large amounts of audio data, there have been cases where the process crashed on the CPU but could be processed normally on the GPU. I think it would be useful to be able to transcribe using devices other than a CPU.

I'll try adding the mel_spectrogram_device parameter based on your advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants