论文信息 - Decision tree usage for incremental parametric speech synthesis

Decision tree usage for incremental parametric speech synthesis

Human speakers plan and deliver their utterances incrementally, piece-by-piece, and it is obvious that their choice regarding phonetic details (and the details' peculiarities) is rarely determined by globally optimal solutions. In contrast, parametric speech synthesizers use a full-utterance context when optimizing vocoding parameters and when determing HMM states. Apart from being cognitively implausible, this impedes incremental use-cases, where the future context is often at least partially unavailable. This paper investigates the `locality' of features in parametric speech synthesis voices and takes some missing steps towards better HMM state selection and prosody modelling for incremental speech synthesis.

Timo Baumann

[1] David Schlangen,et al. The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[2] Florian Schiel,et al. The BITS Speech Synthesis Corpus for German , 2004, LREC.

[3] W. Levelt,et al. Speaking: From Intention to Articulation , 1990 .

[4] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.

[5] David Schlangen,et al. INPRO_iSS: A Component for Just-In-Time Incremental Speech Synthesis , 2012, ACL.

[6] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[7] Caren Brinckmann,et al. The Role of Duration Models and Symbolic Representation for Timing in Synthetic Speech , 2003, Int. J. Speech Technol..

[8] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Thierry Dutoit,et al. PHTS FOR MAX/MSP: A STREAMING ARCHITECTURE FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS , 2011 .

[10] David Schlangen,et al. Evaluating Prosodic Processing for Incremental Speech Synthesis , 2012, INTERSPEECH.

[11] Oliver Watts,et al. The role of higher-level linguistic features in HMM-based speech synthesis , 2010, INTERSPEECH.