Style estimation of speech based on multiple regression hidden semi-Markov model

This paper presents a technique for estimating the degree or intensity of emotional expressions and speaking styles appeared in speech. The key idea is based on a style control technique for speech synthesis using multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse process of the style control. We derive an algorithm for estimating predictor variables of MRHSMM each of which represents a sort of emotion intensity or speaking style variability appeared in acoustic features based on an ML criterion. We also show preliminary experimental results to demonstrate an ability of the proposed technique for synthetic and acted speech samples with emotional expressions and speaking styles.

[1]  Takao Kobayashi,et al.  Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM , 2005, IEICE Trans. Inf. Syst..

[2]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Takashi Nose,et al.  A style control technique for speech synthesis using multiple regression HSMM , 2006, Interspeech.

[4]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[5]  Takao Kobayashi,et al.  MLLR adaptation for hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.

[6]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[7]  Heiga Zen,et al.  Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.

[8]  Takao Kobayashi,et al.  A style control technique for HMM-based speech synthesis , 2004, INTERSPEECH.

[9]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[10]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Takao Kobayashi,et al.  Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis , 2005, IEICE Trans. Inf. Syst..

[12]  Takashi Nose,et al.  A Speaker Adaptation Technique for MRHSMM-Based Style Control of Synthetic Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Takao Kobayashi,et al.  Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing , 2005, IEICE Trans. Inf. Syst..

[14]  Krzysztof Slot,et al.  Low-dimensional feature space derivation for emotion recognition , 2005, INTERSPEECH.