Fully automatic prosody generator for text-to-speech

Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be a key point for further development of new Text-to-Speech systems. This paper describes a system using such speech databases to generate the rhythm and the intonation of a French written text. The system is based on a very crude chinks ’n chunks prosodic phrasing algorithm and on a prosodic analysis of a natural speech database. The rhythm of the synthetic speech is generated with a CART tree trained on a large mono-speaker speech corpus. The acoustic aspect of the intonation is derived from a set of prosodic patterns automatically derived from the same speech corpus. The system has been tested on single sentences and news paragraphs. Informal listening tests have shown that the resulting prosody is convincing most of the time.

[1]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Matthias Pätzold,et al.  Analysis and synthesis of German F0 contours by means of Fujisaki's model , 1993, Speech Commun..

[3]  Hugo C. van Leeuwen,et al.  Speech Maker: a flexible and general framework for text-to-speech synthesis, and its application to Dutch , 1993, Comput. Speech Lang..

[4]  René Collier Multi-lingual intonation synthesis: principles and applications , 1990, SSW.

[5]  Yann Morlec Génération multiparamétrique de la prosodie du français par apprentissage automatique , 1997 .

[6]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Nick Campbell Prosody and the selection of units for concatenation synthesis , 1994, SSW.

[8]  Piet Mertens,et al.  L'intonation du français. De la description linguistique à la reconnaissance automatique , 1987 .

[9]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[10]  Alan W. Black,et al.  Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Klaus J. Kohler Improving the prosody in German text-to-speech output , 1990, SSW.

[12]  Thierry Dutoit,et al.  High-quality speech synthesis for phonetic speech segmentation , 1997, EUROSPEECH.