Ensemble softmax regression model for speech emotion recognition

Automatic emotion recognition from speech signals is one of the important research areas. Most speech emotion recognition methods have been proposed, among which ensemble learning is an effective way. However, they are still confronted with problems, such as the curse of dimensionality and the diversity of the base classifiers hardly ensured. To overcome the problems, this paper proposes an ensemble Softmax regression model for speech emotion recognition (ESSER). It applies the feature extraction methods with much different principles to generate the subspaces for the base classifier, so that the diversity of the base classifiers could be ensured. Furthermore, a feature selection method that selects features according to global structure of the data is used to reduce the dimension of subspaces, which can further increase the diversity of the base classifiers and overcome the curse of dimensionality. As in the case of the diversity of the base classifiers ensured, the performance of ensemble classifier highly depends on the ability of the base classifier, it is reasonable for ESSER to select Softmax as the base classifier as Softmax has shown its superiority in speech emotion recognition. The conducted experiments validate the proposed approach in term of the performance of speech emotion recognition.

[1]  Rong Yan,et al.  On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[3]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[4]  Yu Qian,et al.  Speech emotion recognition using supervised manifold learning based on all-class and pairwise-class feature extraction , 2013, IEEE Conference Anthology.

[5]  Kah Phooi Seng,et al.  A new approach of audio emotion recognition , 2014, Expert Syst. Appl..

[6]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[7]  Vladimer Kobayashi,et al.  Detection of affective states from speech signals using ensembles of classifiers , 2013 .

[8]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[9]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[10]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[11]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[12]  Lukás Burget,et al.  Brno University of Technology system for Interspeech 2009 emotion challenge , 2009, INTERSPEECH.

[13]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[14]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[15]  Rohit Kumar,et al.  Ensemble of SVM trees for multimodal emotion recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[16]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[17]  Carlos Busso,et al.  Compensating for speaker or lexical variabilities in speech for emotion recognition , 2014, Speech Commun..

[18]  Liyanage C. De Silva,et al.  Voting ensembles for spoken affect classification , 2007, J. Netw. Comput. Appl..

[19]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[21]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[22]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[23]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  Shiqing Zhang,et al.  Robust emotion recognition in noisy speech via sparse representation , 2013, Neural Computing and Applications.

[26]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Ragini Verma,et al.  Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..

[28]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Andreas Wendemuth,et al.  Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications , 2014, Comput. Speech Lang..

[31]  Kazi Md. Rokibul Alam,et al.  Emotion recognition from speech based on relevant feature and majority voting , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[32]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[33]  Wojtek Kowalczyk,et al.  Detecting changing emotions in human speech by machine and humans , 2013, Applied Intelligence.

[34]  Lijuan Li,et al.  Speech emotion recognition of decision fusion based on DS evidence theory , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[35]  Pierre Dumouchel,et al.  Anchor Models for Emotion Recognition from Speech , 2013, IEEE Transactions on Affective Computing.

[36]  Yongming Huang,et al.  Speech Emotion Recognition Research Based on the Stacked Generalization Ensemble Neural Network for Robot Pet , 2009, 2009 Chinese Conference on Pattern Recognition.

[37]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[38]  Jarek Krajewski,et al.  Comparing Multiple Classifiers for Speech-Based Detection of Self-Confidence - A Pilot Study , 2010, 2010 20th International Conference on Pattern Recognition.

[39]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[41]  S. Tamil Selvi,et al.  Class-specific multiple classifiers scheme to recognize emotions from speech signals , 2014, Comput. Speech Lang..

[42]  Adil Alpkocak,et al.  Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines , 2008, PIT.

[43]  Shuzhi Sam Ge,et al.  Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines , 2014, Comput. Speech Lang..

[44]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[45]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[46]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[47]  Shu-qu Qian,et al.  Dynamic immune optimization algorithm for Knapsack problem in dynamic environments , 2013, IEEE Conference Anthology.

[48]  Johannes Wagner,et al.  Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data , 2011, IEEE Transactions on Affective Computing.