The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech

Emotion recognition from speech is an important field of research in human-machine-interfaces, and has various applications, for instance for call centers. In the proposed classifier system RASTA-PLP features (perceptual linear prediction) are extracted from the speech signals. The first step is to compute an universal background model (UBM) representing a general structure of the underlying feature space of speech signals. This UBM is modeled as a Gaussian mixture model (GMM). After computing the UBM the sequence of feature vectors extracted from the utterance is used to re-train the UBM. From this GMM the mean vectors are extracted and concatenated to the so-called GMM supervectors which are then applied to a support vector machine classifier. The overall system has been evaluated by using utterances from the public Berlin emotional database. Utilizing the proposed features a recognition rate of 79% (utterance based) has been achieved which is close to the performance of humans on this database.

[1]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[2]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[3]  G. Palm,et al.  Classifier fusion for emotion recognition from speech , 2007 .

[4]  Ramin Shaghaghi Kandovan,et al.  Optimization of speaker verification using adapted Gaussian mixture models for high quality databases , 2007 .

[5]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[7]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[8]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[9]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[10]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[11]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[13]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[14]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[15]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[16]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[17]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[18]  Hynek Hermansky,et al.  Auditory Modeling in Automatic Recognition of Speech , 1996 .

[19]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[20]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[21]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[22]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[23]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.