Fusion of Fragmentary Classifier Decisions for Affective State Recognition

Real human-computer interaction systems based on different modalities face the problem that not all information channels are always available at regular time steps. Nevertheless an estimation of the current user state is required at anytime to enable the system to interact instantaneously based on the available modalities. A novel approach to decision fusion of fragmentary classifications is therefore proposed and empirically evaluated for audio and video signals of a corpus of non-acted user behavior. It is shown that visual and prosodic analysis successfully complement each other leading to an outstanding performance of the fusion architecture.

[1]  Zhihong Zeng,et al.  Audio–Visual Affective Expression Recognition Through Multistream Fused HMM , 2008, IEEE Transactions on Multimedia.

[2]  Peter Robinson,et al.  Interpreting Hand-Over-Face Gestures , 2011, ACII.

[3]  Angeliki Metallinou,et al.  Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[4]  Ingo Siegert,et al.  Appropriate emotional labelling of non-acted speech using basic emotions, geneva emotion wheel and self assessment manikins , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[5]  Günther Palm,et al.  Multiple classifier combination using reject options and markov fusion networks , 2012, ICMI '12.

[6]  Johannes Wagner,et al.  Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data , 2011, IEEE Transactions on Affective Computing.

[7]  Frédo Durand,et al.  Eulerian video magnification for revealing subtle changes in the world , 2012, ACM Trans. Graph..

[8]  A. Al-Hamadi,et al.  Multimodal affect recognition in spontaneous HCI environment , 2012, 2012 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2012).

[9]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[10]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[11]  Hedda Lausberg,et al.  Gestisches Verhalten als Indikator therapeutischer Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Repräsentation von Objektbeziehungen in gestischen Darstellungen , 2011 .

[12]  Benoit Huet,et al.  Towards multimodal emotion recognition: a new approach , 2010, CIVR '10.

[13]  Dietmar F. Rösner,et al.  LAST MINUTE: a Multimodal Corpus of Speech-based User-Companion Interactions , 2012, LREC.

[14]  J. F. Kelley,et al.  An empirical methodology for writing user-friendly natural language computer applications , 1983, CHI '83.

[15]  Günther Palm,et al.  Multi-modal Fusion based on classifiers using reject options and Markov Fusion Networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[16]  Dennis J. McFarland,et al.  Brain–computer interfaces for communication and control , 2002, Clinical Neurophysiology.

[17]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[19]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[20]  Gwen Littlewort,et al.  Data Mining Spontaneous Facial Behavior with Automatic Expression Coding , 2008, COST 2102 Workshop.

[21]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[22]  Günther Palm,et al.  Towards Emotion Recognition in Human Computer Interaction , 2012, WIRN.

[23]  Hynek Hermansky,et al.  Automatic Speech Recognition: an Auditory Perspective , 2004 .

[24]  Loïc Kessous,et al.  Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..

[25]  Ayoub Al-Hamadi,et al.  Towards Pain Recognition in Post-Operative Phases Using 3D-based Features From Video and Support Vector Machines , 2009, J. Digit. Content Technol. its Appl..

[26]  Ayoub Al-Hamadi,et al.  Active Shape Models on adaptively refined mouth emphasizing color images , 2010 .

[27]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Kristian Kroschel,et al.  Audio-visual emotion recognition using an emotion space concept , 2008, 2008 16th European Signal Processing Conference.

[30]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[31]  Gwen Littlewort,et al.  Machine Learning Systems for Detecting Driver Drowsiness , 2009 .