[SSML] Wrong settings used for speech synthesis #27

roehrt · 2022-11-22T22:14:48Z

Describe the bug
Mimic3 ignores <prosody> settings and instead applies the settings of the last closed <prosody> block instead.

To Reproduce
mimic3 '<prosody rate="200%">This should be spoken fast but is not.</prosody><prosody volume="30%">This should be a bit quieter but is actually spoken faster</prosody>' --ssml | aplay

Expected behavior
Mimic3 should speak the first sentence fast and the second one with lowered volume.

Environment
- Device type: desktop
- OS: Ubuntu 22.04

Source of actual behavior

mimic3/mimic3_tts/tts.py

Lines 470 to 501 in be72c18

    
           def end_utterance(self) -> typing.Iterable[BaseResult]: 
        
               last_settings: typing.Optional[Mimic3Settings] = None 
        
               sent_phonemes: PHONEMES_LIST_TYPE = [] 
        
               for result in self._results: 
        
                   if isinstance(result, Mimic3Phonemes): 
        
                       if result.is_utterance: 
        
                           # Utterance boundary 
        
                           if ( 
        
                               sent_phonemes 
        
                               and (last_settings is not None) 
        
                               and (result.current_settings != last_settings) 
        
                           ): 
        
                               # Not compatible with existing utterance. 
        
                               # Need to speak previous utterance first. 
        
                               yield self._speak_sentence_phonemes( 
        
                                   sent_phonemes, settings=last_settings 
        
                               ) 
        
                               sent_phonemes.clear() 
        
                           # Current utterance 
        
                           sent_phonemes.extend(result.phonemes) 
        
                           if sent_phonemes: 
        
                               yield self._speak_sentence_phonemes( 
        
                                   sent_phonemes, settings=last_settings 
        
                               ) 
        
                               sent_phonemes.clear() 
        
                       else: 
        
                           # Continue until utterance boundary 
        
                           sent_phonemes.extend(result.phonemes) 
        
                       last_settings = result.current_settings

The text was updated successfully, but these errors were encountered:

stephenrt42 · 2023-05-25T00:18:43Z

Try this;
<speak> <prosody rate="200%"><s>This should be spoken fast but is not.</s></prosody><prosody volume="30%"><s>This should be a bit quieter but is actually spoken faster</s></prosody> </speak>

roehrt added the bug Something isn't working label Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SSML] Wrong settings used for speech synthesis #27

[SSML] Wrong settings used for speech synthesis #27

roehrt commented Nov 22, 2022

stephenrt42 commented May 25, 2023 •

edited

Loading

[SSML] Wrong settings used for speech synthesis #27

[SSML] Wrong settings used for speech synthesis #27

Comments

roehrt commented Nov 22, 2022

stephenrt42 commented May 25, 2023 • edited Loading

stephenrt42 commented May 25, 2023 •

edited

Loading