Running the same code twice giving two different results #1085

bchinnari · 2024-10-24T12:37:46Z

Hi, I am running faster-whisper on an audio file like follows

segments, info = model.transcribe(wav, task="transcribe", language="hi",beam_size=1, word_timestamps=True,max_new_tokens=50  )

The same code sometimes gives two segments and sometimes gives one segment on the same audio file. I find this weird. Is this expected ? whenever this gives 2 segments, second one of those is always "insertions" . there is no speech, but model gives some words as output.

However if I slightly modify the above statement to not output word timestamps like follows

segments, info = model.transcribe(wav, task="transcribe", language="hi",beam_size=1, max_new_tokens=50  )

I always get only one segment in the output with good accuracy.
Is the presence of "word_timestamps=True" messing this up ?

The text was updated successfully, but these errors were encountered:

bchinnari · 2024-10-24T18:32:38Z

Is this possible ? Did anyone observe this ?

bchinnari · 2024-10-25T09:01:31Z

Ok. Here is what I did. I took a pretrained HF model (https://huggingface.co/vasista22/whisper-hindi-small) and fine-tuned it using my data. Then I converted the checkpoint to faster-whisper format.

If I use "word_timestamps=True" in transcribe function, I am getting extra (useless) segments in the output. I don't know why.

This is not happening if I use whisper model directly for transcription. This is happening with my fine-tuned model only.

bchinnari · 2024-10-25T16:54:14Z

when "word_timestamps=False", the output is as follows
Segment(id=1, seek=600, start=0.0, end=6.0, text='सितम्बर 19', tokens=[50364, 45938, 33279, 36158, 48521, 27099, 3941, 105, 25411, 1294], temperature=0.0, avg_logprob=-0.17912933772260492, compression_ratio=0.6857142857142857, no_speech_prob=1.3633834695708693e-14, words=None)

when it is True, the output is like this
Segment(id=1, seek=252, start=np.float64(0.0), end=np.float64(2.52), text='सितम्बर 19', tokens=[50364, 45938, 33279, 36158, 48521, 27099, 3941, 105, 25411, 1294], temperature=0.0, avg_logprob=-0.17907507040283896, compression_ratio=0.6857142857142857, no_speech_prob=1.3633834695708693e-14, words=[Word(start=np.float64(0.0), end=np.float64(2.16), word='सितम्बर', probability=np.float64(0.999978095293045)), Word(start=np.float64(2.16), end=np.float64(2.52), word=' 19', probability=np.float64(0.9993481040000916))])
Segment(id=2, seek=496, start=np.float64(2.52), end=np.float64(4.96), text='सितम्बर', tokens=[50364, 45938, 33279, 36158, 48521, 27099, 3941, 105, 25411], temperature=0.0, avg_logprob=-0.4705956637859344, compression_ratio=0.65625, no_speech_prob=0.02291429601609707, words=[Word(start=np.float64(2.52), end=np.float64(4.96), word='सितम्बर', probability=np.float64(0.741854028776288))])
Segment(id=3, seek=598, start=np.float64(4.96), end=np.float64(5.98), text='सितम्बर 19', tokens=[50364, 45938, 33279, 36158, 48521, 27099, 3941, 105, 25411, 1294], temperature=0.0, avg_logprob=-0.4931728406385942, compression_ratio=0.6857142857142857, no_speech_prob=0.2716793119907379, words=[Word(start=np.float64(4.96), end=np.float64(5.98), word='सितम्बर', probability=np.float64(0.7634602943435311)), Word(start=np.float64(5.98), end=np.float64(5.98), word=' 19', probability=np.float64(4.705471383203985e-06))])

when the flag is False, the text is correct and also the number of segments is also correct. But the end of the segment is marked as "6.0" which is incorrect. "6sec" is duration of the wave file.
when the flag is True, the first segment text is correct and end time of the first segment is also correct. But it gave two more segments which is incorrect.

Is there something wrong which is obvious ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the same code twice giving two different results #1085

Running the same code twice giving two different results #1085

bchinnari commented Oct 24, 2024

bchinnari commented Oct 24, 2024

bchinnari commented Oct 25, 2024

bchinnari commented Oct 25, 2024 •

edited

Loading

Running the same code twice giving two different results #1085

Running the same code twice giving two different results #1085

Comments

bchinnari commented Oct 24, 2024

bchinnari commented Oct 24, 2024

bchinnari commented Oct 25, 2024

bchinnari commented Oct 25, 2024 • edited Loading

bchinnari commented Oct 25, 2024 •

edited

Loading