Speech ermotion recognition separately from voiced and unvoiced sound for emotional interaction robot

The purpose of this paper is to describe the realization of speech emotion recognition. Generally, text-independent mode has been utilized for speech emotion recognition, hence previous researches have discounted that emotion features vary according to the text or phonemes, though this can distort the classification performance. To overcome this distortion, a framework of speech emotion recognition is proposed based on segmentation of voiced and unvoiced sound. Voiced and unvoiced sound have different characteristic of emotion features as vocalization between voiced sound and unvoiced sound is much different hence, they should be considered separately. In this paper, voiced and unvoiced sound classification is performed using spectral flatness measures and the spectral center, and a Gaussian mixture model with five mixtures was employed for emotion recognition. To confirm the proposed framework, two systems are compared: the first is emotion classification using whole utterances (ordinary method) and the second uses segments of voiced and unvoiced sound (proposed method). The proposed approach yields higher classification rates compared to previous systems in both cases using each of the emotion features (linear prediction coding (LPC), mel-frequency cepstral coefficients (MFCCs), perceptual linear prediction (PLP) and energy) as well as a combination of these four features.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  Miriam Kienast,et al.  Acoustical analysis of spectral and temporal changes in emotional speech , 2000 .

[3]  Kee-Ho Yu,et al.  Generation of space grid map by 3D detection of obstacle distribution , 2008, 2008 International Conference on Control, Automation and Systems.

[4]  Masahiro Fujita,et al.  On activating human communications with pet-type robot AIBO , 2004, Proceedings of the IEEE.

[5]  Yoon Keun Kwak,et al.  Emotion interactive robot focus on speaker independently emotion recognition , 2007, 2007 IEEE/ASME international conference on advanced intelligent mechatronics.

[6]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7]  Björn W. Schuller,et al.  Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[9]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[10]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[12]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[13]  Kwang-Ho Seok,et al.  A new robot motion authoring method using HTM , 2008, 2008 International Conference on Control, Automation and Systems.

[14]  L. Devillers,et al.  Voiced and Unvoiced Content of fear-type emotions in the SAFE Corpus , 2006 .

[15]  Yoon Keun Kwak,et al.  Improvement of Emotion Recognition from Voice by Separating of Obstruents , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[16]  Shingo Tomita,et al.  An optimal orthonormal system for discriminant analysis , 1985, Pattern Recognit..

[17]  Yoon Keun Kwak,et al.  Robust emotion recognition feature, frequency range of meaningful signal , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[18]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[19]  Louis C. W. Pols,et al.  An acoustic description of consonant reduction , 1999, Speech Commun..

[20]  Bernd Kleinjohann,et al.  Fuzzy emotion recognition in natural speech dialogue , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..