Dimensionality reduction for speech emotion features by multiscale kernels

To achieve efficient and compact low-dimensional features for speech emotion recognition, this paper proposes a novel feature reduction method using multiscale kernels in the framework of graph embedding. With Fisher discriminant embedding graph, multiscale Gaussian kernels are used in constructing optimal linear combination of Gram matrices for multiple kernel learning. To evaluate the proposed method, comprehensive experiments, using different public feature sets from the open-source toolbox openSMILE on various corpora, show that the proposed method achieves better performance compared with conventional linear dimensionality reduction methods and singlekernel methods.

[1]  Björn Schuller,et al.  The Automatic Recognition of Emotions in Speech , 2011 .

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[3]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[6]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[7]  Eduardo Coutinho,et al.  Distributing Recognition in Computational Paralinguistics , 2014, IEEE Transactions on Affective Computing.

[8]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[9]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[11]  Jukka Kortelainen,et al.  Classifier-based learning of nonlinear feature manifold for visualization of emotional speech prosody , 2013, IEEE Transactions on Affective Computing.

[12]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[13]  Björn W. Schuller,et al.  Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[14]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[15]  Björn W. Schuller,et al.  Affect recognition in real-life acoustic conditions - a new perspective on feature selection , 2013, INTERSPEECH.

[16]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Ziqiang Wang,et al.  Multiple kernel local Fisher discriminant analysis for face recognition , 2013, Signal Process..

[18]  Yan Cui,et al.  A novel supervised dimensionality reduction algorithm: Graph-based Fisher analysis , 2012, Pattern Recognit..

[19]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[20]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  Igor Bisio,et al.  Gender-Driven Emotion Recognition Through Speech Signals For Ambient Intelligence Applications , 2013, IEEE Transactions on Emerging Topics in Computing.

[24]  K. Scherer,et al.  Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. , 2012, Emotion.

[25]  X. Xu,et al.  Graph Learning Based Speaker Independent Speech Emotion Recognition , 2014 .

[26]  Peng Song,et al.  A feature selection and feature fusion combination method for speaker-independent speech emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Fakhri Karray,et al.  Multiview Supervised Dictionary Learning in Speech Emotion Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Pierre Dumouchel,et al.  Anchor Models for Emotion Recognition from Speech , 2013, IEEE Transactions on Affective Computing.

[29]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[30]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.