Detection of Emotional Events utilizing Support Vector Methods in an Active Learning HCI Scenario

In recent years the fields of affective computing and emotion recognition have experienced a steady increase in attention and especially the creation and analysis of multi-modal corpora has been the focus of intense research. Plausible annotation of this data, however is an enormous problem. In detail emotion annotation is very time consuming, cumbersome and sensitive with respect to the annotator. Furthermore emotional reactions are often very sparse in HCI scenarios resulting in a large annotation overhead to gather the interesting moments of a recording, which in turn are highly relevant for powerful features, classifiers and fusion architectures. Active learning techniques provide methods to improve the annotation processes since the annotator is asked to only label the relevant instances of a given dataset. In this work an unsupervised one-class Support Vector Machine is used to build a background model of non-emotional sequences on a novel HCI dataset. The human annotator is iteratively asked to label instances that are not well explained by the background model, which in turn renders them candidates for being interesting events such as emotional reactions that diverge from the norm. The outcome of the active learning procedure is a reduced dataset of only 14% the size of the original dataset that contains most of the significant information, in this case more than 75% of the emotional events.

[1]  S. Marsland Novelty Detection in Learning Systems , 2008 .

[2]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[3]  Fernando De la Torre,et al.  Facial Expression Analysis , 2011, Visual Analysis of Humans.

[4]  Markus Kächele,et al.  Classification of Emotional States in a Woz Scenario Exploiting Labeled and Unlabeled Bio-physiological Data , 2011, PSL.

[5]  Markus Kächele,et al.  Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition , 2013, PSL.

[6]  Markus Kächele,et al.  Using unlabeled data to improve classification of emotional states in human computer interaction , 2013, Journal on Multimodal User Interfaces.

[7]  Günther Palm,et al.  Multi-modal Fusion based on classifiers using reject options and Markov Fusion Networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[8]  J. Russell A circumplex model of affect. , 1980 .

[9]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[10]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[12]  Sascha Meudt,et al.  Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech , 2013, ICMI '13.

[13]  Takeo Kanade,et al.  Facial Expression Analysis , 2011, AMFG.

[14]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[16]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[17]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[18]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[19]  Sascha Meudt,et al.  Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[20]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[21]  Günther Palm,et al.  Multiple classifier combination using reject options and markov fusion networks , 2012, ICMI '12.

[22]  Friedhelm Schwenker,et al.  Multimodal Emotion Classification in Naturalistic User Behavior , 2011, HCI.

[23]  Daniel McDuff,et al.  Crowdsourcing Techniques for Affective Computing , 2015 .

[24]  Frank Honold,et al.  Multimodal Interaction History and its use in Error Detection and Recovery , 2014, ICMI.

[25]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[26]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[27]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[28]  Günther Palm,et al.  Towards Emotion Recognition in Human Computer Interaction , 2012, WIRN.

[29]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[30]  Markus Kächele,et al.  Cascaded Fusion of Dynamic, Spatial, and Textural Feature Sets for Person-Independent Facial Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[31]  Maja Pantic,et al.  Biologically vs. Logic Inspired Encoding of Facial Actions and Emotions in Video , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[32]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.