Improved speech emotion recognition using error correcting codes

We propose the use of the popular error correcting codes (ECC) in a multi-class audio emotion recognition scenario to improve the emotion recognition accuracy in spoken speech. In this paper, we visualize the emotion recognition system as a noisy communication channel, thus motivating the use of ECC in the emotion recognition process. We assume the emotion recognition process consists of an audio feature extraction module followed by an artificial neural network (ANN) for emotion (represented by a binary string) classification. The noisy communication channel, in our formulation, is the insufficiently learnt ANN classifier which in turn results in an erroneous (binary string) emotion classification. In our system, we use ECC to encode the binary string representing the emotion class using a Block Coder (BC). We show through rigorous experimentation, on Emo-DB database, that the use of ECC improves the recognition accuracy of the emotion classification system in the range of (4.6 - 9.35)% in comparison to the baseline ANN-based emotion classification system.

[1]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[3]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[4]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[5]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[6]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[7]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[8]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[9]  Sarah Jane Delany,et al.  Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study , 2011, Face and Gesture 2011.

[10]  Hatice Gunes,et al.  Automatic, Dimensional and Continuous Emotion Recognition , 2010, Int. J. Synth. Emot..

[11]  T. Moon Error Correction Coding: Mathematical Methods and Algorithms , 2005 .

[12]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[13]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[14]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[15]  N. Allen,et al.  Emotion Recognition in Spontaneous Speech within Work and Family Environments , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[16]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[17]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[19]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[21]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[22]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[23]  H. Critchley,et al.  Neural correlates of processing valence and arousal in affective words. , 2006, Cerebral cortex.

[24]  K. Scherer,et al.  Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization , 2008, Consciousness and Cognition.

[25]  Behrouz A. Forouzan,et al.  Data Communications and Networking , 2000 .

[26]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[27]  Laurence Devillers,et al.  Five emotion classes detection in real-world call center data : the use of various types of paralinguistic features , 2007 .