Exploring dataset similarities using PCA-based feature selection

In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets, introducing a new ranking method. Further, this enables us to present a method for automatic identification of groups of corpora with similar characteristics. This answers an urgent question in classifier development, namely whether data from different corpora is similar enough to jointly be used as training material, overcoming shortage of material in matching domains. We compare the results of this method with manual groupings of corpora. We consider the established emotional speech corpora AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS and VAM, however our approach is general.

[1]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  J. Xu,et al.  Principal Component Analysis based Feature Selection for clustering , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[3]  Peter Robinson,et al.  Speech Emotion Classification and Public Speaking Skill Assessment , 2010, HBU.

[4]  Florian Schiel,et al.  Development of the UserState Conventions for the Multimodal Corpus in SmartKom , 2002 .

[5]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[6]  Björn W. Schuller,et al.  Unsupervised learning in cross-corpus acoustic emotion recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Björn W. Schuller,et al.  Balancing spoken content adaptation and unit length in the recognition of emotion and interest , 2008, INTERSPEECH.

[8]  Naomi Harte,et al.  Feature sets for automatic classification of dimensional affect , 2012 .

[9]  Pierre Dumouchel,et al.  Cepstral and long-term features for emotion recognition , 2009, INTERSPEECH.

[10]  Björn Schuller,et al.  Towards measuring similarity between emotional corpora , 2010 .

[11]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[12]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[13]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[14]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  Björn W. Schuller,et al.  Audiovisual recognition of spontaneous interest within conversations , 2007, ICMI '07.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  L. Devillers,et al.  Acoustic measures characterizing anger across corpora collected in artificial or natural context , 2010 .

[18]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[19]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[20]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[21]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[22]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[23]  Björn W. Schuller,et al.  Audiovisual Behavior Modeling by Combined Feature Spaces , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Florian Eyben,et al.  the Munich open Speech and Music Interpretation by Large Space Extraction toolkit , 2010 .

[25]  Kim Hartmann,et al.  Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech , 2014, Cognitive Computation.

[26]  Mark Elshaw,et al.  Emotional recognition from the speech signal for a virtual education agent , 2013 .

[27]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).