A feature selection and feature fusion combination method for speaker-independent speech emotion recognition

To enhance the recognition rate of speaker independent speech emotion recognition, a feature selection and feature fusion combination method based on multiple kernel learning is presented. Firstly, multiple kernel learning is used to obtain sparse feature subsets. The features selected at least n times are recombined into another subset named n-subset. The optimal n is determined by 10 cross-validation experiments. Secondly, feature fusion is made at the kernel level. Not only each kind of feature is associated with a kernel, but also the full feature set is associated with a kernel which is not considered in the previous studies. All of the kernels are added together to obtain a combination kernel. The final recognition rate for 7 kinds of emotions on Berlin Database is 83.10%, which outperforms state-of-the-art results and shows the effectiveness of our method. It is also proved that MFCCs play a crucial role in speech emotion recognition.

[1]  Ian R. Fasel,et al.  A learning approach to hierarchical feature selection and aggregation for audio classification , 2010, Pattern Recognit. Lett..

[2]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[3]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[4]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[6]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[7]  Shiqing Zhang,et al.  Speech Emotion Recognition Using an Enhanced Kernel Isomap for Human-Robot Interaction , 2013 .

[8]  Constantine Kotropoulos,et al.  Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition , 2008, Signal Process..

[9]  Peng Song,et al.  Speaker-Independent Speech Emotion Recognition Based on Two-Layer Multiple Kernel Learning , 2013, IEICE Trans. Inf. Syst..

[10]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[12]  Tim Polzehl,et al.  Improving Automatic Emotion Recognition from speech using Rhythm and Temporal feature , 2013, ArXiv.

[13]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[14]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[15]  Mohan M. Trivedi,et al.  Speech Emotion Analysis: Exploring the Role of Context , 2010, IEEE Transactions on Multimedia.

[16]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[17]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[18]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[19]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.