Speech Emotion Recognition: Methods and Cases Study

In this paper we compare different approaches for emotions recognition task and we propose an efficient solution based on combination of these approaches. Recurrent neural network (RNN) classifier is used to classify seven emotions found in the Berlin and Spanish databases. Its performances are compared to Multivariate linear regression (MLR) and Support vector machine (SVM) classifiers. The explored features included: melfrequency cepstrum coefficients (MFCC) and modulation spectral features (MSFs). Finally results for different combinations of the features and on different databases are compared and explained. The overall experimental results reveal that the feature combination of MFCC and MS has the highest accuracy rate on both Spanish emotional database using RNN classifier 90,05% and Berlin emotional database using MLR 82,41%.

[1]  Saeed Setayeshi,et al.  Speech emotion recognition based on a modified brain emotional learning model , 2017, BICA 2017.

[2]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[3]  Xiong Chen,et al.  Automatic Speech Emotion Recognition using Support Vector Machine , 2011, Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology.

[4]  Sathit Prasomphan Improvement of speech emotion recognition with neural network classifier by using speech spectrogram , 2015, 2015 International Conference on Systems, Signals and Image Processing (IWSSIP).

[5]  Shashidhar G. Koolagudi,et al.  SVM Scheme for Speech Emotion Recognition using MFCC Feature , 2013 .

[6]  Martin Vondra,et al.  Recognition of Emotions in German Speech Using Gaussian Mixture Models , 2008, COST 2102 School.

[7]  Qin Jin,et al.  Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[10]  V. Ramalingam,et al.  ARTIFICIAL NEURAL NETWORK BASED PATHOLOGICAL VOICE CLASSIFICATION USING MFCC FEATURES , 2014 .

[11]  Dipti D. Joshi,et al.  Speech Emotion Recognition: A Review , 2013 .

[12]  Peipei Shen,et al.  Feature Extraction and Selection in Speech Emotion Recognition , 2012 .

[13]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[14]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Eric O. Postma,et al.  Speech Emotion Recognition Using Voiced Segment Selection Algorithm , 2016, ECAI.

[16]  P. Chandrasekhar,et al.  SVM Based Speech Emotion Recognition Compared with GMM- UBM and NN , 2016 .

[17]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[18]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[19]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .