Relative entropy normalized Gaussian supervector for speech emotion recognition using kernel extreme learning machine

Speech emotion recognition is a challenging and significant task. On the one hand, the emotion features need to be robust enough to capture the emotion information, and while on the other, machine learning algorithms need to be insensitive to model the utterance. In this paper, we presented a novel framework of speech emotion recognition to address the two above-mentioned challenges. Relative Entropy based Normalization (REN) was proposed to normalize the supervectors of Gaussian Mixture Model-Universal Background Model (GMM-UBM) as the features to emotions. The Kernel Extreme Learning Machine (KELM) was adopted as the classifier to identify the emotion represented by the normalized supervectors. Experimental results on the EMR_1309 corpus showed the proposed framework outperformed the state-of-the-art i-vector based systems.

[1]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[6]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[7]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[8]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Ji Wu,et al.  Speech emotion recognition with i-vector feature and RNN model , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[10]  Fuji Ren,et al.  Speech emotion recognition using combination of features , 2013, 2013 Fourth International Conference on Intelligent Control and Information Processing (ICICIP).

[11]  Haizhou Li,et al.  GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[13]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[14]  William M. Campbell,et al.  Nuisance Attribute Projection , 2009, Encyclopedia of Biometrics.

[15]  Rui Xia,et al.  Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.