Replies: 2 comments
-
Note that your link gave a 404, and it was probably referring to a different implementation of Whisper. However, the model itself was trained on monolingual audio, so you can't directly give it a list of languages and expect it to automatically code switch between them (although without direction, Whisper can sometimes successfully transcribe code switching, but it will be unreliable). However, workarounds are outlined in #2009 where you can try first diarizing the audio with pyannote, then detect the language of each segment and transcribe them each monolingually, and stitch the result together. That's about the extent of what you could do with Whisper. Perhaps instead of pyannote, you could try running tiny whisper with a prompt that includes hyphens in front of each utterance which can influence Whisper to do a primitive sort of speaker change detection. Outside of Whisper, you can also look at solutions based on Meta's MMS finetuned for multilingual code switching. |
Beta Was this translation helpful? Give feedback.
-
Just correct the above link: https://huggingface.co/blog/fine-tune-whisper |
Beta Was this translation helpful? Give feedback.
-
In the chapter Load WhisperTokenizer @ https://huggingface.co/blog/fine-tune-Whisper, it is mentioned that " We simply have to specify the target language and the task."
Does that mean that all the languages are separated and cannot be mixed output in some Chinese, such as mixing English words in Chinese?
Beta Was this translation helpful? Give feedback.
All reactions