论文信息 - A selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation

A selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation

This paper describes the development of a new text-tospeech synthesis system in French. The system is based on selection and concatenation of natural speech segments, taken in large annotated speech data bases. In a first part the databases design, content and annotation procedures are presented. It appeared that about 1 hour speech databasesare large enough for a building a TTS system. In a second part, the system architecture is described. A key feature of the present system is that only 4 simple and efficient selection criteria are proposed. A formal comparative evaluation procedure is described in the third part. The experiments show that the new system is preferred along all the evaluation categories to the previous system, which is based on diphone concatenation and synthesis by rules of the prosody. The most significant improvements brought by the new system seems to be for voice pleasantness and overall impression.

Christophe d'Alessandro | Romain Prudon

[1] Ann K. Syrdal,et al. Diphone synthesis using unit selection , 1998, SSW.

[2] Yoshinori Sagisaka,et al. ATR μ-talk speech synthesis system , 1992, ICSLP.

[3] Klaus Fellbaum,et al. Auditive Bestimmung und Vergleich der Sprachqualität von Sprachsynthesesystemen für die deutsche Sprache , 1997 .

[4] Peter Jackson,et al. Non-uniform unit selection and the similarity metric within BT's Laureate TTS system , 1998, SSW.

[5] Christophe d'Alessandro,et al. Automatic pitch contour stylization using a model of tonal perception , 1995, Comput. Speech Lang..

[6] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7] Alan W. Black,et al. CHATR: a generic speech synthesis system , 1994, COLING.

[8] Maxine Eskénazi,et al. Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.

[9] Nick Campbell,et al. Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.