Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex

Predicting human emotions is catching the attention of many research areas, which demand accurate predictions in uncontrolled scenarios. Despite this attractiveness, designed systems for emotion detection are far off being as accurate as desired. Two of the typical measurements in human emotions are described in terms of the dimensions valence and arousal, which shape the Russell's circumplex where complex emotions lie. Thus, the Affect Recognition Sub-Challenge (ASC) of the third AudioVisual Emotion and Depression Challenge, AVEC'13, is focused on estimating these two dimensions. This paper presents a three-level fusion system combining single regression results from audio and visual features, in order to maximize the mean average correlation on both dimensions. Five sets of features are extracted (three for audio and two for video), and they are merged following an iterative process. Results show how this fusion outperforms the baseline method for the challenge database.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Fakhri Karray,et al.  Audio-Based Emotion Recognition from Natural Conversations Based on Co-Occurrence Matrix and Frequency Domain Energy Distribution Features , 2011, ACII.

[3]  Daniel González-Jiménez,et al.  Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry , 2007, IEEE Transactions on Information Forensics and Security.

[4]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[5]  Roland Kuhn,et al.  Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[6]  Daniel González-Jiménez,et al.  Single- and cross- database benchmarks for gender classification under unconstrained settings , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[7]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[8]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[9]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[10]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[11]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[12]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Robert I. Westwood,et al.  Speaker Adaptation Using Eigenvoices , 1999 .

[14]  Prashant Parikh A Theory of Communication , 2010 .

[15]  Björn W. Schuller,et al.  LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..

[16]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[17]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[18]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[20]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[21]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[22]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[23]  Maja Pantic,et al.  Action unit detection using sparse appearance descriptors in space-time video volumes , 2011, Face and Gesture 2011.

[24]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[25]  J. Beveridge,et al.  Average of Synthetic Exact Filters , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Arman Savran,et al.  Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering , 2012, ICMI '12.

[27]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[28]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[30]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[31]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[32]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[33]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[34]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[35]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[37]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[38]  Bir Bhanu,et al.  A Psychologically-Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition , 2011, ACII.

[39]  Nadia Bianchi-Berthouze,et al.  Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models , 2011, ACII.