Multimodal Affect Recognition in the Context of Human-Computer Interaction for Companion-Systems

In general, humans interact with each other using multiple modalities. The main channels are speech, facial expressions, and gesture. But also bio-physiological data such as biopotentials can convey valuable information which can be used to interpret the communication in a dedicated way. A Companion-System can use these modalities to perform an efficient human-computer interaction (HCI). To do so, the multiple sources need to be analyzed and combined in technical systems. However, so far only few studies have been published dealing with the fusion of three or even more such modalities. This chapter addresses the necessary processing steps in the development of a multimodal system applying fusion approaches.

[1]  Friedhelm Schwenker,et al.  Fusion of Fragmentary Classifier Decisions for Affective State Recognition , 2012, MPRSS.

[2]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[3]  Friedhelm Schwenker,et al.  Fusion paradigms in cognitive technical systems for human-computer interaction , 2015, Neurocomputing.

[4]  Eva Hudlicka,et al.  To feel or not to feel: The role of affect in human-computer interaction , 2003, Int. J. Hum. Comput. Stud..

[5]  Ayoub Al-Hamadi,et al.  Machine vision based recognition of emotions using the circumplex model of affect , 2011, 2011 International Conference on Multimedia Technology.

[6]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[7]  Ingo Siegert,et al.  Towards Emotion and Affect Detection in the Multimodal LAST MINUTE Corpus , 2012, LREC.

[8]  K. H. Kim,et al.  Emotion recognition system using short-term monitoring of physiological signals , 2004, Medical and Biological Engineering and Computing.

[9]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  Friedhelm Schwenker,et al.  Multiple Classifier Systems for the Recogonition of Human Emotions , 2010, MCS.

[11]  G. Palm,et al.  Classifier fusion for emotion recognition from speech , 2007 .

[12]  Markus Kächele,et al.  Multiple Classifier Systems for the Classification of Audio-Visual Emotional States , 2011, ACII.

[13]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[14]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[17]  P. Lang Behavioral treatment and bio-behavioral assessment: computer applications , 1980 .

[18]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[19]  Günther Palm,et al.  A generic framework for the inference of user states in human computer interaction , 2012, Journal on Multimodal User Interfaces.

[20]  Markus Kächele,et al.  Cascaded Fusion of Dynamic, Spatial, and Textural Feature Sets for Person-Independent Facial Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[21]  Thomas C. Schmidt,et al.  EXMARaLDA – creating, analysing and sharing spoken language corpora for pragmatic research , 2009 .

[22]  Friedhelm Schwenker,et al.  A Framework for Emotions and Dispositions in Man-Companion Interaction , 2013 .

[23]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[24]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[25]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[26]  Ayoub Al-Hamadi,et al.  Suppression of Uncertainties at Emotional Transitions - Facial Mimics Recognition in Video with 3-D Model , 2010, VISAPP.

[27]  Friedhelm Schwenker,et al.  Using speaker group dependent modelling to improve fusion of fragmentary classifier decisions , 2013, 2013 IEEE International Conference on Cybernetics (CYBCO).

[28]  Ingo Siegert,et al.  ikannotate - A Tool for Labelling, Transcription, and Annotation of Emotionally Coloured Speech , 2011, ACII.

[29]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[30]  Günther Palm,et al.  Towards Emotion Recognition in Human Computer Interaction , 2012, WIRN.

[31]  Sascha Meudt,et al.  Enhanced Autocorrelation in Real World Emotion Recognition , 2014, ICMI.

[32]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[33]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[34]  Sascha Meudt,et al.  Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech , 2013, ICMI '13.

[35]  Günther Palm,et al.  The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech , 2009, ICANN.

[36]  Markus Kächele,et al.  Using unlabeled data to improve classification of emotional states in human computer interaction , 2013, Journal on Multimodal User Interfaces.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Günther Palm,et al.  Spotting laughter in natural multiparty conversations: A comparison of automatic online and offline approaches using audiovisual data , 2012, TIIS.

[39]  Zoran Obradovic,et al.  Reusable components in decision tree induction algorithms , 2011, Computational Statistics.

[40]  A. Al-Hamadi,et al.  Multimodal affect recognition in spontaneous HCI environment , 2012, 2012 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2012).

[41]  Michael Glodek,et al.  A layered architecture for probabilistic complex pattern recognition to detect user preferences , 2014, BICA 2014.

[42]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[43]  Günther Palm,et al.  Combination of sequential class distributions from multiple channels using Markov fusion networks , 2014, Journal on Multimodal User Interfaces.

[44]  Friedhelm Schwenker,et al.  Kalman Filter Based Classifier Fusion for Affective State Recognition , 2013, MCS.

[45]  Friedhelm Schwenker,et al.  Ensemble Gaussian mixture models for probability density estimation , 2012, Computational Statistics.

[46]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[47]  Jennifer A. Healey,et al.  Wearable and automotive systems for affect recognition from physiology , 2000 .

[48]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[49]  Frank Honold,et al.  Multimodal Interaction History and its use in Error Detection and Recovery , 2014, ICMI.

[50]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[51]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[52]  Thomas Schmidt,et al.  FOLKER: An Annotation Tool for Efficient Transcription of Natural, Multi-party Interaction , 2010, LREC.

[53]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.

[54]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  D. O. North,et al.  An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems , 1963 .

[56]  Günther Palm,et al.  Wizard-of-Oz Data Collection for Perception and Interaction in Multi-User Environments , 2006, LREC.

[57]  Friedhelm Schwenker,et al.  Multimodal Emotion Classification in Naturalistic User Behavior , 2011, HCI.

[58]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[59]  Günes Karabulut-Kurt,et al.  Perceptual audio features for emotion detection , 2012, EURASIP J. Audio Speech Music. Process..

[60]  Alex Pentland,et al.  Social signal processing: state-of-the-art and future perspectives of an emerging domain , 2008, ACM Multimedia.

[61]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[62]  Günther Palm,et al.  On the discovery of events in EEG data utilizing information fusion , 2013, Comput. Stat..