论文信息 - OpenMary - Open Source Unit Selection as the Basis for Research on Expressive Synthesis

OpenMary - Open Source Unit Selection as the Basis for Research on Expressive Synthesis

This paper describes the unit selection component of the open source text-to-speech system OpenMary. It is a generic unit selection component with parameterisable target and join cost functions followed by optional signal post-processing, and it h as been created for the Blizzard challenge 2006. We describe the creation process, the core properties of the system and the preparati on of a synthesis voice from the Blizzard data. We also point out the directions in which we are currently developing the system towards expressive unit selection. Index Terms: speech synthesis, unit selection, expressive speech, target cost, signal processing

Anna Hunecke

[1] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[3] Alan W. Black,et al. Flite: a small fast run-time synthesis engine , 2001, SSW.

[4] Marc Schröder,et al. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[5] Shrikanth Narayanan,et al. Limited domain synthesis of expressive military speech for animated characters , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[6] Nick Campbell,et al. Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders , 2003, Int. J. Speech Technol..

[7] Christophe d'Alessandro,et al. Voice quality modification for emotional speech synthesis , 2003, INTERSPEECH.

[8] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[9] Florian Schiel,et al. The BITS Speech Synthesis Corpus for German , 2004, LREC.

[10] Michael Picheny,et al. The IBM expressive text-to-speech synthesis system for American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Marc Schröder,et al. Expressing degree of activation in synthetic speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.