A selection/concatenation text-to-speech synthesis system: databases development, system design, comparative evaluation

This paper describes the development of a new text-tospeech synthesis system in French. The system is based on selection and concatenation of natural speech segments, taken in large annotated speech data bases. In a first part the databases design, content and annotation procedures are presented. It appeared that about 1 hour speech databasesare large enough for a building a TTS system. In a second part, the system architecture is described. A key feature of the present system is that only 4 simple and efficient selection criteria are proposed. A formal comparative evaluation procedure is described in the third part. The experiments show that the new system is preferred along all the evaluation categories to the previous system, which is based on diphone concatenation and synthesis by rules of the prosody. The most significant improvements brought by the new system seems to be for voice pleasantness and overall impression.