Feature Selection Based Transfer Subspace Learning for Speech Emotion Recognition

Cross-corpus speech emotion recognition has recently received considerable attention due to the widespread existence of various emotional speech. It takes one corpus as the training data aiming to recognize emotions of another corpus, and generally involves two basic problems, i.e., feature matching and feature selection. Many previous works study these two problems independently, or just focus on solving the first problem. In this paper, we propose a novel algorithm, called feature selection based transfer subspace learning (FSTSL), to address these two problems. To deal with the first problem, a latent common subspace is learnt by reducing the difference of different corpora and preserving the important properties. Meanwhile, we adopt the <inline-formula><tex-math notation="LaTeX">$l_{2,1}$</tex-math><alternatives><mml:math><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="song-ieq1-2800046.gif"/></alternatives></inline-formula>-norm on the projection matrix to deal with the second problem. Besides, to guarantee the subspace to be robust and discriminative, the geometric information of data is exploited simultaneously in the proposed FSTSL framework. Empirical experiments on cross-corpus speech emotion recognition tasks demonstrate that our proposed method can achieve encouraging results in comparison with state-of-the-art algorithms.

[1]  Emily Mower Provost,et al.  Cross-Corpus Acoustic Emotion Recognition with Multi-Task Learning: Seeking Common Ground While Preserving Differences , 2019, IEEE Transactions on Affective Computing.

[2]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[4]  Xuelong Li,et al.  Unsupervised Feature Selection with Structured Graph Optimization , 2016, AAAI.

[5]  Peng Song,et al.  Speaker-Independent Speech Emotion Recognition Based on Two-Layer Multiple Kernel Learning , 2013, IEICE Trans. Inf. Syst..

[6]  Jinkyu Lee,et al.  High-level feature representation using recurrent neural network for speech emotion recognition , 2015, INTERSPEECH.

[7]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[12]  Gabriela Csurka,et al.  Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[15]  Adrian Barbu,et al.  Feature Selection with Annealing for Computer Vision and Big Data Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Emily Mower Provost,et al.  Using regional saliency for speech emotion recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Xuelong Li,et al.  Discriminative Transfer Subspace Learning via Low-Rank and Sparse Representation , 2016, IEEE Transactions on Image Processing.

[19]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[20]  Xuelong Li,et al.  A-Optimal Projection for Image Representation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tieniu Tan,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[23]  Björn W. Schuller,et al.  Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by Speech Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Björn W. Schuller,et al.  Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[27]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[29]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Peng Song,et al.  Speech Emotion Recognition Using Transfer Learning , 2014, IEICE Trans. Inf. Syst..

[31]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[32]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Aroor Dinesh Dileep,et al.  Speech emotion recognition using kernel sparse representation based classifier , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[34]  Philip S. Yu,et al.  Transfer Sparse Coding for Robust Image Representation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Gary L. Miller,et al.  Graph Embeddings and Laplacian Eigenvalues , 2000, SIAM J. Matrix Anal. Appl..

[36]  Carlos Busso,et al.  Iterative feature normalization for emotional speech detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[38]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[40]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[41]  Jianmin Wang,et al.  Transfer Learning with Graph Co-Regularization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[42]  Peng Song,et al.  Speech emotion recognition method based on hidden factor analysis , 2015 .

[43]  Peng Song,et al.  Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization , 2016, Speech Commun..

[44]  E. Ambikairajah,et al.  Speaker Normalisation for Speech-Based Emotion Detection , 2007, 2007 15th International Conference on Digital Signal Processing.

[45]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[46]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[47]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[48]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[49]  George Trigeorgis,et al.  The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring , 2017, INTERSPEECH.

[50]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[51]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[52]  Jiawei Han,et al.  Spectral Regression for Efficient Regularized Subspace Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  peng song,et al.  Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition , 2019, IEEE Transactions on Affective Computing.

[54]  Xin Xu,et al.  Survey on discriminative feature selection for speech emotion recognition , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[55]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[56]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[57]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[58]  Fakhri Karray,et al.  Multiview Supervised Dictionary Learning in Speech Emotion Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[59]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[60]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[61]  Ming Shao,et al.  Generalized Transfer Subspace Learning Through Low-Rank Constraint , 2014, International Journal of Computer Vision.

[62]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[63]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..