Add option to carry initial_prompt with the sliding window #2343

kittsil · 2024-09-18T03:11:58Z

Background
Whisper's transcribe() struggles with contextual proper nouns if they appear after the initial prompt has been consumed; see some experimental results here. This solves that issue by allowing the initial "context" prompt to be carried as the sliding window moves through the audio.

Changes
Add an option carry_initial_prompt = False to whisper.transcribe().

When carry_initial_prompt is set to True, initial_prompt is prepended to each internal decode() call's prompt. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space.

kittsil · 2024-09-18T04:10:37Z

There are outstanding issues with this PR:

I have not found the definition of the 224 context token length.
It prepends the initial_prompt to itself before enough tokens have been generated, resulting in a predilection toward looping.
I have not written tests.

Closing this PR since I can't find a way to move it to draft.

ryanheise · 2024-09-18T04:16:25Z

Closing this PR since I can't find a way to move it to draft.

How to: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/changing-the-stage-of-a-pull-request

ryanheise · 2024-09-18T04:29:29Z

Also a relevant discussion here: #1040 (comment)

I have not found the definition of the 224 context token length.

It's part of the model dimensions itself, actually 448 tokens total, and half that for the prompt. The logic is in decoding.py if you look for self.n_ctx: int = model.dims.n_text_ctx and look for the references to it.

Add an option `carry_initial_prompt = False` to `whisper.transcribe()`. When set to `True`, `initial_prompt` is prepended to each internal `decode()` call's `prompt`. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space.

kittsil · 2024-09-19T04:21:39Z

@ryanheise Thank you for your input; it was helpful. Do you mind providing any additional feedback?

Aside: I did find the left-slice in the code, and it turns out that the docs are wrong, as actually the maximum prompt length is 223!

Confirming with the medium.en model...

>>> medium = torch.load('/home/kittsil/.cache/whisper/medium.en.pt')
>>> medium['dims']
{'n_mels': 80, 'n_vocab': 51864, 'n_audio_ctx': 1500, 'n_audio_state': 1024, 'n_audio_head': 16, 'n_audio_layer': 24, 'n_text_ctx': 448, 'n_text_state': 1024, 'n_text_head': 16, 'n_text_layer': 24}
>>> medium['dims']['n_text_ctx'] // 2 - 1
223

FurkanGozukara · 2024-10-05T15:08:51Z

hello if i locally merge this what do i add command to prevent whisper losing punctuation during transcription?

can you also update here so i can directly install it : https://github.com/kittsil/whisper/tree/patch-1

@kittsil

FurkanGozukara · 2024-10-22T08:48:59Z

why this very important feature is still not merged @jongwook ?

FurkanGozukara · 2024-10-22T08:50:50Z

@kittsil i use CLI so adding --carry_initial_prompt will work right?

FurkanGozukara · 2024-10-22T09:11:27Z

I am transcribing a 3 hours video working awesome so far

how errors like this can be fixed?

kittsil · 2024-10-23T02:42:56Z

how errors like this can be fixed?

@FurkanGozukara, that's an issue with whisper, not with your prompt. You can try setting compression_ratio_threshold lower; I have found some success with 1.7 (as opposed to the default 2.4).

In general, though, I wouldn't comment on a PR for debugging help; it's best to keep PRs focused on the request / review process.

FurkanGozukara · 2024-10-25T00:21:36Z

@kittsil thank you so much your PR saved me so much

I transcribed this 3 hours video and without your PR I would be devastated because YouTube auto timing also failed :D

https://youtu.be/FvpWy1x5etM

kittsil closed this Sep 18, 2024

kittsil reopened this Sep 19, 2024

kittsil closed this Sep 19, 2024

kittsil force-pushed the patch-1 branch from 05f6534 to 32d55d5 Compare September 19, 2024 03:55

Kittsil and others added 3 commits September 18, 2024 22:58

Prevent redundant initial_prompt_tokens

fae8ede

Merge branch 'openai:main' into patch-1

207f5b9

kittsil reopened this Sep 19, 2024

Revert unnecessary .gitignore change

afeccc1

FurkanGozukara mentioned this pull request Oct 26, 2024

more pytorch versions in tests #2408

Merged

Merge branch 'main' into patch-1

1778952

jongwook merged commit 5979f03 into openai:main Oct 26, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to carry initial_prompt with the sliding window #2343

Add option to carry initial_prompt with the sliding window #2343

kittsil commented Sep 18, 2024 •

edited

Loading

kittsil commented Sep 18, 2024

ryanheise commented Sep 18, 2024

ryanheise commented Sep 18, 2024

kittsil commented Sep 19, 2024 •

edited

Loading

FurkanGozukara commented Oct 5, 2024 •

edited

Loading

FurkanGozukara commented Oct 22, 2024

FurkanGozukara commented Oct 22, 2024

FurkanGozukara commented Oct 22, 2024

kittsil commented Oct 23, 2024

FurkanGozukara commented Oct 25, 2024

Add option to carry initial_prompt with the sliding window #2343

Add option to carry initial_prompt with the sliding window #2343

Conversation

kittsil commented Sep 18, 2024 • edited Loading

kittsil commented Sep 18, 2024

ryanheise commented Sep 18, 2024

ryanheise commented Sep 18, 2024

kittsil commented Sep 19, 2024 • edited Loading

FurkanGozukara commented Oct 5, 2024 • edited Loading

FurkanGozukara commented Oct 22, 2024

FurkanGozukara commented Oct 22, 2024

FurkanGozukara commented Oct 22, 2024

kittsil commented Oct 23, 2024

FurkanGozukara commented Oct 25, 2024

kittsil commented Sep 18, 2024 •

edited

Loading

kittsil commented Sep 19, 2024 •

edited

Loading

FurkanGozukara commented Oct 5, 2024 •

edited

Loading