Sentence level emotion recognition based on decisions from subsentence segments

Emotion recognition from speech plays an important role in developing affective and intelligent systems. This study investigates sentence-level emotion recognition. We propose to use a two-step approach to leverage information from subsentence segments for sentence level decision. First we use a segment level emotion classifier to generate predictions for segments within a sentence. A second component combines the predictions from these segments to obtain a sentence level decision. We evaluate different segment units (words, phrases, time-based segments) and different decision combination methods (majority vote, average of probabilities, and a Gaussian Mixture Model (GMM)). Our experimental results on two different data sets show that our proposed method significantly outperforms the standard sentence-based classification approach. In addition, we find that using time-based segments achieves the best performance, and thus no speech recognition or alignment is needed when using our method, which is important to develop language independent emotion recognition systems.

[1]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[2]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[3]  Shrikanth S. Narayanan,et al.  A cluster-profile representation of emotion using agglomerative hierarchical clustering , 2010, INTERSPEECH.

[4]  Björn W. Schuller,et al.  Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm , 2010, INTERSPEECH.

[5]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[6]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[7]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[8]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[9]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[10]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[12]  Mohamed S. Kamel,et al.  Segment-based approach to the recognition of emotions in speech , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[13]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[14]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[15]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[16]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.