Gender-dependent emotion recognition based on HMMs and SPHMMs

It is well known that emotion recognition performance is not ideal. The work of this research is devoted to improving emotion recognition performance by employing a two-stage recognizer that combines and integrates gender recognizer and emotion recognizer into one system. Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in the two-stage recognizer. This recognizer has been tested on two distinct and separate emotional speech databases. The first database is our collected database and the second one is the Emotional Prosody Speech and Transcripts database. Six basic emotions including the neutral state have been used in each database. Our results show that emotion recognition performance based on the two-stage approach (gender-dependent emotion recognizer) has been significantly improved compared to that based on emotion recognizer without gender information and emotion recognizer with correct gender information by an average of 11 % and 5 %, respectively. This work shows that the highest emotion identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models. The results achieved based on the two-stage framework fall within 2.28 % of those obtained in subjective assessment by human judges.

[1]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[2]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[3]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[4]  Analysis and investigation of emotion identification in biased emotional talking environments , 2011 .

[5]  Kwee-Bo Sim,et al.  Emotion recognition and acoustic analysis from speech signal , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  I. Shahin Speaker Identification in Emotional Environments , 2010 .

[7]  Alessandra Russo,et al.  Multistyle classification of speech under stress using feature subset selection based on genetic algorithms , 2007, Speech Commun..

[8]  Ismail Shahin Identifying speakers using their emotion cues , 2011, Int. J. Speech Technol..

[9]  Petros Maragos,et al.  Analysis and classification of speech signals by generalized fractal dimension features , 2009, Speech Commun..

[10]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[11]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[12]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Thomas Fang Zheng,et al.  Study on speaker verification on emotional speech , 2006, INTERSPEECH.

[14]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[15]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[17]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[18]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[19]  BoschLouis ten Emotions, speech and the ASR framework , 2003 .

[20]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[21]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[22]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[23]  FragopanagosN.,et al.  2005 Special Issue , 2005 .

[24]  Ismail Shahin,et al.  Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[25]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[26]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[27]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..