Emotion recognition using semi-supervised feature selection with speaker normalization

Feature selection methods are the mostly used dimensional reduction methods in speech emotion recognition. However, most methods cannot preserve the manifold of data and cannot use the information provided by unlabeled data, so that they cannot select a good sub feature set for speech emotion recognition. This paper presents a semi-supervised feature selection method that can preserve the manifold structure of data, preserve the category structure, and use the information provided by the unlabeled data. To further deal with the manifold of speech data influenced by factors such as emotion, speaker and sentence, a new speaker normalization method is also proposed, which can achieve a good speaker normalization result in the case of a small number of samples of a speaker available. This speaker normalization method can be used in most real application of speech emotion recognition. The conducted experiments validate the proposed semi-supervised feature selection method with the speaker normalization in terms of the effectiveness of the speech emotion recognition.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Carlos A. Reyes-García,et al.  Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model , 2012 .

[3]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[4]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[5]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[6]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[7]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[9]  Mansour Sheikhan,et al.  Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network , 2011, Neural Computing and Applications.

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  Björn W. Schuller,et al.  Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing , 2007, ACII.

[12]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[13]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[14]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[15]  A. Tanju Erdem,et al.  Formant position based weighted spectral features for emotion recognition , 2011, Speech Commun..

[16]  Shiqing Zhang,et al.  Dimensionality reduction-based spoken emotion recognition , 2011, Multimedia Tools and Applications.

[17]  Vidhyasaharan Sethu,et al.  Speaker variability in emotion recognition - an adaptation based approach , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[19]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[20]  Hanghang Tong,et al.  Semi-supervised learning with nuclear norm regularization , 2013, Pattern Recognit..

[21]  E. Ambikairajah,et al.  Speaker Normalisation for Speech-Based Emotion Detection , 2007, 2007 15th International Conference on Digital Signal Processing.

[22]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[23]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Pong C. Yuen,et al.  Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions , 2013, Pattern Recognit..

[25]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[26]  Tommy W. S. Chow,et al.  Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction , 2012, Pattern Recognit..

[27]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[28]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[29]  Diego H. Milone,et al.  Spoken emotion recognition using hierarchical classifiers , 2011, Comput. Speech Lang..

[30]  Ling He,et al.  Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech , 2011, Biomed. Signal Process. Control..

[31]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[32]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[33]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[34]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[35]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[36]  Jiamin Liu,et al.  Enhanced semi-supervised local Fisher discriminant analysis for face recognition , 2012, Future Gener. Comput. Syst..

[37]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[38]  Tsang-Long Pao,et al.  Segment-based emotion recognition from continuous Mandarin Chinese speech , 2011, Comput. Hum. Behav..

[39]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[40]  W. Krzanowski Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .

[41]  Bogdan Raducanu,et al.  A supervised non-linear dimensionality reduction approach for manifold learning , 2012, Pattern Recognit..

[42]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[43]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[44]  Yang Yu,et al.  Semi-supervised kernel learning based optical image recognition , 2012 .

[45]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[46]  Ramón López-Cózar,et al.  Enhancement of emotion detection in spoken dialogue systems by combining several information sources , 2011, Speech Commun..

[47]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[48]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[49]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[50]  Carlos Busso,et al.  Iterative feature normalization for emotional speech detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Emmanuel Dellandréa,et al.  Recognition of emotions in speech by a hierarchical approach , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[52]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[53]  Chang Dong Yoo,et al.  Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[54]  Soo-Young Lee,et al.  Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[55]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..