论文信息 - The Toshiba entry for the 2007 Blizzard Challenge

The Toshiba entry for the 2007 Blizzard Challenge

This paper describes the system with which we took part in the Blizzard Challenge for the first time. It describes how we created our own annotation from scratch and introduces the various system components, in particular the back-end. Results show that our system achieves excellent intelligibility, in particular with less data (e.g. the lowest word error rate for voice C 1 ), and reasonable naturalness (MOS of 3.0, 2.9 and 2.9 for voice A, B and C, respectively). They also reveal that our simple approach of randomly selecting sentences for voice C worked well.

Sabine Buchholz | Norbert Braunschweiler | Masahiro Morita | Gabe Webster

[1] Simon King,et al. The Blizzard Challenge 2007 , 2007 .

[2] Sabine Buchholz,et al. How (not) to select your voice corpus: random selection vs. phonologically balanced , 2007, SSW.

[3] Kate Knill,et al. A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis , 2005, INTERSPEECH.

[4] Peter Jackson,et al. Analysis and modelling of question intonation in american English , 2006, Speech Prosody 2006.

[5] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[6] Keiichi Tokuda,et al. ATRECSS — ATR ENGLISH SPEECH CORPUS FOR SPEECH SYNTHESIS , 2007 .

[7] Peter Jackson,et al. Combining models of prosodic phrasing and pausing , 2005, INTERSPEECH.

[8] Gabriel Webster. Improving letter-to-pronunciation accuracy with automatic morphologically-based stress prediction , 2004, INTERSPEECH.

[9] Tatsuya Mizutani,et al. Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method , 2005, IEICE Trans. Inf. Syst..

[10] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.

[11] Norbert Braunschweiler,et al. The Prosodizer - Automatic Prosodic Annotations of Speech Synthesis Databases , 2006 .

[12] Colin W. Wightman,et al. The aligner: text to speech alignment using Markov models and a pronunciation dictionary , 1994, SSW.

[13] Takehiko Kagoshima,et al. Toshiba English text-to-speech synthesizer (TESS) , 1999, EUROSPEECH.

[14] Steve Young,et al. The HTK book , 1995 .

[15] Takehiko Kagoshima,et al. An F0 contour control model for totally speaker driven text to speech system , 1998, ICSLP.