Discrete/Continuous Modelling of Speaking Style in HMM-Based Speech Synthesis: Design and Evaluation

This paper assesses the ability of a HMM-based speech synthesis systems to model the speech characteristics of various speaking styles 1 . A discrete/continuous HMM is presented to model the symbolic and acoustic speech characteristics of a speaking style. The proposed model is used to model the average characteristics of a speaking style that is shared among various speakers, depending on specific situations of speech communication. The evaluation consists of an identification experiment of 4 speaking styles based on delexicalized speech, and compared to a similar experiment on natural speech. The comparison is discussed and reveals that discrete/continuous HMM consistently models the speech characteristics of a speaking style. Index Terms: speaking style, speech synthesis, speech prosody, average modelling.

[1]  Sacha Krstulovic,et al.  An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements , 2007, INTERSPEECH.

[2]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  Anne Lacheret,et al.  HMM-based prosodic structure model using rich linguistic context , 2010, INTERSPEECH.

[5]  Anne-Catherine Simon,et al.  Les phonostyles: une description prosodique des styles de parole en français , 2010 .

[6]  Anne Lacheret,et al.  Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation , 2010, Linguistic Annotation Workshop.

[7]  Anne Lacheret,et al.  Expectations for discourse genre identification: a prosodic study , 2010, INTERSPEECH.

[8]  Helmut Schmid,et al.  New Statistical Methods for Phrase Break Prediction , 2004, COLING.

[9]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[10]  Tina Burrows,et al.  Adaptation of Prosodic Phrasing Models , 2006 .

[11]  Junichi Yamagishi,et al.  HMM-BASED EXPRESSIVE SPEECH SYNTHESIS — TOWARDS TTS WITH ARBITRARY SPEAKING STYLES AND EMOTIONS , 2003 .

[12]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Éric Villemonte de la Clergerie From metagrammars to factorized TAG/TIG parsers , 2005, IWPT.

[14]  Anne Lacheret,et al.  Towards Improved HMM-based Speech Synthesis Using High-Level Syntactical Features. , 2009 .

[15]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[16]  Heiga Zen,et al.  Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.