Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation

This paper presents the baseline text-to-speech system developed at UPC (Ogmios) plus our recent work on speech prosody generation and the procedures to create high quality language resources for speech synthesis. These contributions have been evaluated within the TC-STAR European project, which is focused on speech-to-speech translation. Several presented contributions have been developed in order to adapt the TTS component to the speech-to-speech translation framework. In this application, the input text is not writtenstyle text but transcriptions of talks. Moreover, we have to cope with errors coming from the speech recognition and speech translation engines. However, in speech-to-speech translation, the source speech can be used as a valuable source of information to generate the target prosody. The general framework and rst results are presented in the paper.

[1]  Antonio Bonafonte Prosody generation in the Speech-to-Speech Translation Framework , 2006 .

[2]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[3]  Thomas Portele,et al.  PURR - A method for prosody evaluation and investigation , 1998, Comput. Speech Lang..

[4]  Antonio Bonafonte,et al.  Training the tilt intonation model using the JEMA methodology , 2005, INTERSPEECH.

[5]  Antonio Bonafonte,et al.  Intonation modeling for TTS using a joint extraction and prediction approach , 2004, SSW.

[6]  Antonio Bonafonte,et al.  ECESS Inter-Module Interface Specification for Speech Synthesis , 2006, LREC.

[7]  Jordi Adell,et al.  Prosody Generation for Speech-to-Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Antonio Bonafonte,et al.  Joint extraction and prediction of fujisaki's intonation model parameters , 2004, INTERSPEECH.

[9]  Antonio Bonafonte,et al.  Consistent Estimation of Fujisaki ’ s Intonation Model Parameters , 2005 .

[10]  Eric Keller,et al.  Fundamentals of speech synthesis and speech recognition: basic concepts, state-of-the-art and future challenges , 1995 .

[11]  Antonio Bonafonte,et al.  Automatic voice-source parameterization of natural speech , 2005, INTERSPEECH.

[12]  Nick Campbell Speech & Expression; the Value of a Longitudinal Corpus , 2004, LREC.

[13]  Jordi Adell,et al.  Database Pruning for Unsupervised Building of Text-To-Speech Voices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Keikichi Hirose,et al.  Filled pauses as cues to the complexity of following phrases , 2005, INTERSPEECH.

[15]  J. E. Tree Listeners' uses of um and uh in speech comprehension. , 2001 .