Evaluation of emotion recognition from speech

Over the last few years, interest on paralinguistic information classification has grown considerably. However, in comparison to related speech processing tasks such as Automatic Speech and Speaker Recognition, practically no standardised corpora and test-conditions exist to compare performances under exactly the same conditions. The successive challenges proposed at the world's largest conference on automatic speech processing, namely the INTERSPEECH conferences, are important for comparing performance of statistical classifiers. In this paper, we summarize results, commonly used methods of challenge participants and results of Koç University, Multimedia, Vision and Graphics Laboratory on the same tasks. Our main contributions include Formant Position-based weighted Spectral features that emphasize emotion in speech and RANSAC-based (Random Sampling Consensus) Training data selection for pruning possible outliers in the training set.

[1]  Isabel Trancoso,et al.  Age and gender classification using fusion of acoustic and prosodic features , 2010, INTERSPEECH.

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[3]  A. Tanju Erdem,et al.  RANSAC-Based Training Data Selection for Speaker State Recognition , 2011, INTERSPEECH.

[4]  Angeliki Metallinou,et al.  Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors , 2011, INTERSPEECH.

[5]  Shuzhi Sam Ge,et al.  Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines , 2011, INTERSPEECH.

[6]  A. Tanju Erdem,et al.  Improving automatic emotion recognition from speech signals , 2009, INTERSPEECH.

[7]  Pierre Dumouchel,et al.  Cepstral and long-term features for emotion recognition , 2009, INTERSPEECH.

[8]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[9]  A. Tanju Erdem,et al.  RANSAC-Based Training Data Selection on Spectral Features for Emotion Recognition from Spontaneous Speech , 2010, COST 2102 Conference.

[10]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[11]  Lukás Burget,et al.  Brno University of Technology system for Interspeech 2009 emotion challenge , 2009, INTERSPEECH.

[12]  Lukás Burget,et al.  Brno university of technology system for interspeech 2010 paralinguistic challenge , 2010, INTERSPEECH.

[13]  Klaus R. Scherer,et al.  Emotion dimensions and formant position , 2009, INTERSPEECH.

[14]  A. Tanju Erdem,et al.  Formant position based weighted spectral features for emotion recognition , 2011, Speech Commun..