MARY TTS participation in the Blizzard Challenge 2007

This paper describes the second participation of the open source MARY TTS unit selection system in a Blizzard challenge. Compared to last year’s system, a number of welldefined changes have been made to the algorithm, concerning unit definition, prosody models, and signal modification. The results in this year’s challenge are considerably improved, confirming that the changes were worthwhile. The paper also reports on an approach to the selection of a subset of the utterances provided, in order to build a voice with good coverage not larger than the pre-defined “Arctic” subset. Results show that this small voice is perceived slightly better than the voice we built from the Arctic subset.

[1]  Andrew Hunt,et al.  A new W3C markup standard for text-to-speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Marc Schröder,et al.  Voice quality interpolation for emotional text-to-speech synthesis , 2005, INTERSPEECH.

[3]  Marc Schröder Interpolating Expressions in Unit Selection , 2007, ACII.

[4]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[5]  Sacha Krstulović,et al.  Investigating HMMs as a parametric model for expressive speech synthesis in German , 2007 .

[6]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[7]  Santitham Prom-on,et al.  Quantitative Target Approximation Model: Simulating Underlying Mechanisms of Tones and Intonations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[10]  Hiroya Fujisaki,et al.  THE INFLUENCE OF FOCAL CONDITION , SENTENCE MODE AND PHRASE BOUNDARY LOCATION ON SYLLABLE DURATION AND THE F 0 CONTOUR IN GERMAN , 1999 .

[11]  Anna Hunecke OpenMary - Open Source Unit Selection as the Basis for Research on Expressive Synthesis , 2006 .

[12]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.