OpenMary - Open Source Unit Selection as the Basis for Research on Expressive Synthesis

This paper describes the unit selection component of the open source text-to-speech system OpenMary. It is a generic unit selection component with parameterisable target and join cost functions followed by optional signal post-processing, and it h as been created for the Blizzard challenge 2006. We describe the creation process, the core properties of the system and the preparati on of a synthesis voice from the Blizzard data. We also point out the directions in which we are currently developing the system towards expressive unit selection. Index Terms: speech synthesis, unit selection, expressive speech, target cost, signal processing

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[3]  Alan W. Black,et al.  Flite: a small fast run-time synthesis engine , 2001, SSW.

[4]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[5]  Shrikanth Narayanan,et al.  Limited domain synthesis of expressive military speech for animated characters , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[6]  Nick Campbell,et al.  Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders , 2003, Int. J. Speech Technol..

[7]  Christophe d'Alessandro,et al.  Voice quality modification for emotional speech synthesis , 2003, INTERSPEECH.

[8]  Simon King,et al.  Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[9]  Florian Schiel,et al.  The BITS Speech Synthesis Corpus for German , 2004, LREC.

[10]  Michael Picheny,et al.  The IBM expressive text-to-speech synthesis system for American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Marc Schröder,et al.  Expressing degree of activation in synthetic speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.