Applying a hybrid intonation model to a seamless speech synthesizer

We present a speech synthesizer to seamlessly concatenate recorded and synthetic phrases to produce natural sounding and highly expressive speech. Not only the acoustic units, but also the F0 contours are seamlessly concatenated together from recorded and synthetic phrases. When mixed with recorded phrases, the F0 contours of synthetic phrases are generated adaptively relative to the actual surrounding F0 shapes of the recorded phrases. Although the intonation generation scheme was originally developed for unlimited speech synthesis, it is quite naturally extended to a hybrid intonation generation.

[1]  Takashi Saito,et al.  Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes , 2001, IEICE Trans. Inf. Syst..

[2]  Paul Taylor,et al.  Speech synthesis by phonological structure matching , 1999, EUROSPEECH.

[3]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[4]  Salim Roukos,et al.  Phrase splicing and variable substitution using the IBM trainable speech synthesis system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Takashi Saito,et al.  A method of creating a new speaker²s voicefont in a text-to-speech system , 2000, INTERSPEECH.