Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition

Abstract In emotion recognition from speech, huge amounts of training material are needed for the development of classification engines. As most current corpora do not supply enough material, a combination of different datasets is advisable. Unfortunately, data recording is done differently and various emotion elicitation and emotion annotation methods are used. Therefore, a combination of corpora is usually not possible without further effort. The manuscript’s aim is to answer the question which corpora are similar enough to jointly be used as training material. A corpus similarity measure based on PCA-ranked features is presented and similar datasets are identified. To evaluate our method we used nine well-known benchmark corpora and automatically identified a sub-set of six most similar datasets. To test that the identified most similar six datasets influence the classification performance, we conducted several cross-corpora emotion recognition experiments comparing our identified six most similar datasets with other combinations. Our most similar sub-set outperforms all other combinations of corpora, the combination of all nine datasets as well as feature normalization techniques. Also influencing side-effects on the recognition rate were excluded. Finally, the predictive power of our measure is shown: increasing similarity score, expressing decreasing similarity, result in decreasing recognition rates. Thus, our similarity measure answers the question which corpora should be included into joint training.

[1]  Roddy Cowie,et al.  Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches , 2006, LREC.

[2]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[3]  Björn Schuller,et al.  Cross-Corpus Classification of Realistic Emotions - Some Pilot Experiments , 2010, LREC 2010.

[4]  Timothy Baldwin,et al.  Evaluating a Topic Modelling Approach to Measuring Corpus Similarity , 2016, LREC.

[5]  Björn Schuller,et al.  Selecting Training Data for Cross-Corpus Speech Emotion Recognition: Prototypicality vs. Generalization , 2011 .

[6]  Björn W. Schuller,et al.  Unsupervised learning in cross-corpus acoustic emotion recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Björn Schuller,et al.  Towards measuring similarity between emotional corpora , 2010 .

[8]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[9]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[10]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[11]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[12]  Vitomir Štruc,et al.  Towards Efficient Multi-Modal Emotion Recognition , 2013 .

[13]  Léon J. M. Rothkrantz,et al.  Emotion Recognition from Speech by Combining Databases and Fusion of Classifiers , 2010, TSD.

[14]  Natalie Lewandowski,et al.  Talent in nonnative phonetic convergence , 2012 .

[15]  Florian Schiel,et al.  Development of the UserState Conventions for the Multimodal Corpus in SmartKom , 2002 .

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Daniel P. W. Ellis,et al.  Quantitative Analysis of a Common Audio Similarity Measure , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Kim Hartmann,et al.  Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech , 2014, Cognitive Computation.

[19]  David A. van Leeuwen,et al.  Speech-based recognition of self-reported and observed emotion in a dimensional space , 2012, Speech Commun..

[20]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[21]  EmoTV 1 : Annotation of Real-life Emotions for the Specification of Multimodal Affective Interfaces , 2005 .

[22]  Carlos Busso,et al.  Interpreting ambiguous emotional expressions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[23]  Mark Elshaw,et al.  Emotional recognition from the speech signal for a virtual education agent , 2013 .

[24]  Ingo Siegert,et al.  Comparative Study on Normalisation in Emotion Recognition from Speech , 2017, IHCI.

[25]  Andries Petrus Engelbrecht,et al.  Feature Reduction for Dimensional Emotion Recognition in Human-Robot Interaction , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[26]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[27]  Ingo Siegert,et al.  Exploring dataset similarities using PCA-based feature selection , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[28]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  J. Xu,et al.  Principal Component Analysis based Feature Selection for clustering , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[30]  Ji Xi,et al.  Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data , 2013 .

[31]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[32]  Emily Mower Provost,et al.  Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[34]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[35]  Yanning Zhang,et al.  Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[36]  Marie Tahon,et al.  Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Elmar Nöth,et al.  Whence and Whither Prosody in Automatic Speech Understanding: A Case Study , 2002 .

[38]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[39]  Francis Kubala,et al.  Hidden Markov Models and Speaker Adaptation , 1992 .

[40]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[41]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[42]  Peter Robinson,et al.  Speech Emotion Classification and Public Speaking Skill Assessment , 2010, HBU.

[43]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[44]  Björn W. Schuller,et al.  Audiovisual Behavior Modeling by Combined Feature Spaces , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[45]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[46]  Panagiotis Papapetrou,et al.  Benchmarking dynamic time warping for music retrieval , 2010, PETRA '10.

[47]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[48]  Björn W. Schuller,et al.  Speech Analysis in the Big Data Era , 2015, TSD.

[49]  L. Devillers,et al.  Acoustic measures characterizing anger across corpora collected in artificial or natural context , 2010 .

[50]  Lukás Burget,et al.  Application of speaker- and language identification state-of-the-art techniques for emotion recognition , 2011, Speech Commun..

[51]  Elmar Nöth,et al.  Compensation of extrinsic variability in speaker verification systems on simulated Skype and HF channel data , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[53]  Anna Korhonen,et al.  Exploring variation across biomedical subdomains , 2010, COLING.

[54]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[55]  Driss Matrouf,et al.  Factor analysis multi-session training constraint in session compensation for speaker verification , 2008, INTERSPEECH.

[56]  Tamás D. Gedeon,et al.  Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary , 2013, ICMI '13.

[57]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[58]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[59]  Catholijn M. Jonker,et al.  Cross-corpus analysis for acoustic recognition of negative interactions , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[60]  Jeff Mielke A phonetically based metric of sound similarity , 2012 .

[61]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[62]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[63]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[64]  Matthias Steinbauer,et al.  Using Big Data for Emotionally Intelligent Mobile Services through Multi-Modal Emotion Recognition , 2015, ICOST.

[65]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).