The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching

This paper introduces the German text-to-speech synthesis system MARY. The system's main features, namely a modular design and an XML-based system-internal data representation, are pointed out, and the properties of the individual modules are briefly presented. An interface allowing the user to access and modify intermediate processing steps without the need for a technical understanding of the system is described, along with examples of how this interface can be put to use in research, development and teaching. The usefulness of the modular and transparent design approach is further illustrated with an early prototype of an interface for emotional speech synthesis.

[1]  Elliotte Rusty Harold XML Bible , 1999 .

[2]  H. Schlosberg A scale for the judgement of facial expressions , 1941 .

[3]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[4]  Jürgen TROUVAIN Tempo Control in Speech Synthesis by Prosodic Phrasing , 2002 .

[5]  Petra Wagner,et al.  Speech synthesis development made easy: the bonn open synthesis system , 2001, INTERSPEECH.

[6]  H. Scholsberg A scale for the judgment of facial expressions. , 1941 .

[7]  H. Pirker,et al.  I SAID TWO TI CKETS HOW TO TALK TO A DEAF WIZARD , 1999 .

[8]  Mari Ostendorf,et al.  SABLE: a standard for TTS markup , 1998, ICSLP.

[9]  Marc Pierce,et al.  Word Prosodic Systems in the Languages of Europe , 2000 .

[10]  Caren Brinckmann,et al.  The Role of Duration Models and Symbolic Representation for Timing in Synthetic Speech , 2003, Int. J. Speech Technol..

[11]  Wojciech Skut,et al.  Chunk Tagger - Statistical Recognition of Noun Phrases , 1998, ArXiv.

[12]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[13]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[14]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[15]  Rüdiger Hoffmann,et al.  - 61-An Interactive Course on Speech Synthesis , 2000 .

[16]  Jürgen Trouvain,et al.  On the prosody of German telephone numbers , 2001, INTERSPEECH.

[17]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Richard Sproat,et al.  The bell labs German text-to-speech system: an overview , 1997, EUROSPEECH.

[19]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[20]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[21]  Jürgen Trouvain,et al.  The Effect of Tempo on Prosodic Structure , 1999 .

[22]  Christof Traber Syntactic processing and prosody control in the SVOX TTS system for German , 1993, EUROSPEECH.

[23]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[24]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[25]  Amy Isard,et al.  A markup language for text-to-speech synthesis richard sproat , 1997, EUROSPEECH.

[26]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[27]  Mark Breitenbücher Textvorverarbeitung zur deutschen Version des Festival Text-to-Speech Synthese Systems , 1997 .

[28]  Amy Isard,et al.  SSML: A speech synthesis markup language , 1997, Speech Commun..

[29]  Hartmut Wittig Implementation and Testing , 1999 .

[30]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[31]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[32]  J. Russell A circumplex model of affect. , 1980 .

[33]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[34]  Caren Brinckmann,et al.  On the role of duration prediction and symbolic representation for the evaluation of synthetic speech , 2001, SSW.

[35]  S. Baumann,et al.  German Intonation in Autosegmental-Metrical Phonology* , 2005 .