Are timestamp tokens used in previous text? #2140

George0828Zhang · 2024-04-19T09:03:52Z

George0828Zhang
Apr 19, 2024

According to Figure 1 in the whisper paper, during training, the previous text tokens does not contain timestamp tokens. However, during transcribe, if without_timestamps=False and condition_on_previous_text=True, the prompt tokens (which contains previous text) get passed into model with the timestamp tokens. I confirmed this by printing out options.prompt right before this line:

whisper/whisper/transcribe.py

Line 195 in ba3f3cd

decode_result = model.decode(segment, options)

There are several timestamp tokens in there.

If this is indeed true, wouldn't it cause train-test mismatch?

Timestamp tokens had never appeared before <|startoftranscript|> during training, but now they do in inference.
Timestamp tokens are only from the range [0, 30]. It does not make sense for both the previous and the current segments to have timestamps within [0, 30]. For example, Previous segment:

<|startofprev|><|0.00|>This is sentence 1.<|1.02|>...<|29.02|>This is another.<|29.54|>

Current transcript:

<|startoftranscript|><|0.02|>This is another.<|1.04|>...

If both have timestamps, then current segment should start from 30. But the vocabulary is missing for >30s timestamps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are timestamp tokens used in previous text? #2140

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Are timestamp tokens used in previous text? #2140

George0828Zhang Apr 19, 2024

Replies: 0 comments

George0828Zhang
Apr 19, 2024