Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition

For cross-corpus speech emotion recognition (SER), how to obtain effective feature representation for the discrepancy elimination of feature distributions between source and target domains is a crucial issue. In this paper, we propose a Target-adapted Subspace Learning (TaSL) method for cross-corpus SER. The TaSL method trys to find a projection subspace, where the feature regress the label more accurately and the gap of feature distributions in target and source domains is bridged effectively. Then, in order to obtain more optimal projection matrix, 1 norm and 2,1 norm penalty terms are added to different regularization terms, respectively. Finally, we conduct extensive experiments on three public corpuses, EmoDB, eNTERFACE and AFEW 4.0. The experimental results show that our proposed method can achieve better performance compared with the state-of-the-art methods in the cross-corpus SER tasks. key words: cross-corpus speech emotion recognition, transfer learning, domain adaptation, subspace learning

[1]  Donald Goldfarb,et al.  2 A Variable-Splitting Augmented Lagrangian Framework , 2011 .

[2]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[3]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[4]  Wenming Zheng,et al.  EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks , 2020, IEEE Transactions on Affective Computing.

[5]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[6]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[7]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[9]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[10]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Na Liu,et al.  Unsupervised Cross-Corpus Speech Emotion Recognition Using Domain-Adaptive Subspace Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[13]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[14]  Guoying Zhao,et al.  Learning From Hierarchical Spatiotemporal Descriptors for Micro-Expression Recognition , 2018, IEEE Transactions on Multimedia.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Peng Song,et al.  Unsupervised Cross-Database Micro-Expression Recognition Using Target-Adapted Least-Squares Regression , 2019, IEICE Trans. Inf. Syst..

[17]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[18]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[19]  Tong Zhang,et al.  Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression , 2016, IEEE Signal Processing Letters.

[20]  Wenming Zheng,et al.  MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition , 2019, IEEE Access.

[21]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[22]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..