Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace

This paper proposes an analytical approach based on Kernel Canonical Correlation Analysis (KCCA) for domain adaptation. To generate paired instances for KCCA, we mapped source and target data onto both source and target principal components. We performed pair-wise domain adaptation between four emotional speech corpora with different languages (English, German, Italian, and Polish) to validate the approach. We compared our approach with the Shared-Hidden-Layer Auto-Encoder (SHLA) and kernel based principal components. On average, the proposed approach yields higher classification performance.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  James D. Edge,et al.  Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.

[3]  Janaina Mourão Miranda,et al.  Unsupervised analysis of fMRI data using kernel canonical correlation , 2007, NeuroImage.

[4]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  John Shawe-Taylor,et al.  Using KCCA for Japanese–English cross-language information retrieval and document classification , 2006, Journal of Intelligent Information Systems.

[6]  Giovanni Costantini,et al.  EMOVO Corpus: an Italian Emotional Speech Database , 2014, LREC.

[7]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[9]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[10]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[11]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[12]  Yves GRENIER Speaker adaptation through canonical correlation analysis , 1980, ICASSP.

[13]  Björn W. Schuller,et al.  Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[15]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[16]  Shiliang Sun,et al.  A survey of multi-source domain adaptation , 2015, Inf. Fusion.

[17]  Wojciech Majewski,et al.  Polish Emotional Speech Database - Recording and Preliminary Validation , 2009, COST 2102 Conference.

[18]  Björn W. Schuller,et al.  iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[19]  G. Chollet,et al.  Adaptation of automatic speech recognizers to new speakers using canonical correlation analysis techniques , 1986 .

[20]  Anna Esposito,et al.  Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions , 2009 .

[21]  Long Lan,et al.  Discriminative Locality Preserving Canonical Correlation Analysis , 2012, CCPR.

[22]  John Shawe-Taylor,et al.  The use of machine translation tools for cross-lingual text mining , 2005 .

[23]  Björn W. Schuller,et al.  CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Eduardo Coutinho,et al.  Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Karen Livescu,et al.  Multi-view learning of acoustic features for speaker recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.