Evidence Theory-Based Multimodal Emotion Recognition

Automatic recognition of human affective states is still a largely unexplored and challenging topic. Even more issues arise when dealing with variable quality of the inputs or aiming for real-time, unconstrained, and person independent scenarios. In this paper, we explore audio-visual multimodal emotion recognition. We present SAMMI, a framework designed to extract real-time emotion appraisals from non-prototypical, person independent, facial expressions and vocal prosody. Different probabilistic method for fusion are compared and evaluated with a novel fusion technique called NNET. Results shows that NNET can improve the recognition score (CR + ) of about 19% and the mean average precision of about 30% with respect to the best unimodal system.

[1]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[2]  Benoit Huet,et al.  Neural Network Combining Classifier Based on Dempster-Shafer Theory for Semantic Indexing in Video Content , 2007, MMM.

[3]  Benoit Huet,et al.  Classifier Fusion: Combination Methods For Semantic Indexing in Video Content , 2006, ICANN.

[4]  Maja Pantic,et al.  Fully automatic facial feature point detection using Gabor feature based boosted classifiers , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[6]  Benoit Huet,et al.  SAMMI: semantic affect-enhanced multimedia indexing , 2007 .

[7]  Olivier Villon,et al.  Toward Building Adaptive User's Psycho-Physiological Maps of Emotions using Bio-Sensors , 2006 .

[8]  Yuxiao Hu,et al.  Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition , 2006, MM '06.

[9]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[10]  Zhihong Zeng,et al.  Audio-visual affect recognition through multi-stream fused HMM for HCI , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[12]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[13]  Thomas S. Huang,et al.  Emotion Recognition from Facial Expressions using Multilevel HMM , 2000 .

[14]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[16]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  James Noble,et al.  Spoken emotion recognition with support vector machines , 2003 .

[18]  Christine L. Lisetti,et al.  Using Noninvasive Wearable Computers to Recognize Human Emotions from Physiological Signals , 2004, EURASIP J. Adv. Signal Process..

[19]  Thierry Denoeux An evidence-theoretic neural network classifier , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[20]  Benoit Huet,et al.  Analysis of vector space model and spatiotemporal segmentation for video indexing and retrieval , 2007, CIVR '07.

[21]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[22]  Nicu Sebe,et al.  Facial expression recognition from video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[23]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[24]  Abu Sayeed Md. Sohail,et al.  Detection of Facial Feature Points Using Anthropometric Face Model , 2008 .

[25]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[26]  Christine L. Lisetti,et al.  A User-Modeling Approach to Build User's Psycho-Physiological Maps of Emotions using Bio-Sensors , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[27]  Benoit Huet,et al.  Multi-level Fusion for Semantic Video Content Indexing and Retrieval , 2007, Adaptive Multimedia Retrieval.

[28]  Benoit Huet,et al.  Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[29]  Benoit Huet,et al.  Low-level feature fusion models for soccer scene classification , 2008, 2008 IEEE International Conference on Multimedia and Expo.