Multimodal emotion recognition with automatic peak frame selection

In this paper we present an effective framework for multimodal emotion recognition based on a novel approach for automatic peak frame selection from audio-visual video sequences. Given a video with an emotional expression, peak frames are the ones at which the emotion is at its apex. The objective of peak frame selection is to make the training process for the automatic emotion recognition system easier by summarizing the expressed emotion over a video sequence. The main steps of the proposed framework consists of extraction of video and audio features based on peak frame selection, unimodal classification and decision level fusion of audio and visual results. We evaluated the performance of our approach on eNTERFACE'05 audio-visual database containing six basic emotional classes. Experimental results demonstrate the effectiveness and superiority of the proposed system over other methods in the literature.

[1]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[2]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Benoit Huet,et al.  Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[4]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals* , 2008, IEEE Transactions on Multimedia.

[5]  Simon Lucey,et al.  Automated Facial Expression Recognition System , 2009, 43rd Annual 2009 International Carnahan Conference on Security Technology.

[6]  Alan Hanjalic,et al.  Accurate eye localization in low and standard definition content , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  Ling Guan,et al.  Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[8]  Rok Gajsek,et al.  Multi-modal Emotion Recognition Using Canonical Correlations and Acoustic Features , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[10]  Yau-Hwang Kuo,et al.  Learning collaborative decision-making parameters for multimodal emotion recognition , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Nasrollah Moghaddam Charkari,et al.  Multimodal information fusion application to human emotion recognition from face and speech , 2010, Multimedia Tools and Applications.

[12]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[13]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Gregory Dudek,et al.  Bimodal information analysis for emotion recognition , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[15]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[16]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[17]  Mohan M. Trivedi,et al.  Face Expression Recognition by Cross Modal Data Association , 2013, IEEE Transactions on Multimedia.

[18]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[19]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[20]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[21]  Léon J. M. Rothkrantz,et al.  Emotion recognition using bimodal data fusion , 2011, CompSysTech '11.

[22]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[23]  A. Kring,et al.  Measuring Changes in Emotion During Psychotherapy: Conceptual and Methodological Issues , 2007 .

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Gwen Littlewort,et al.  Automatic coding of facial expressions displayed during posed and genuine pain , 2009, Image Vis. Comput..