AFEW-VA database for valence and arousal estimation in-the-wild

Continuous dimensional models of human affect, such as those based on valence and arousal, have been shown to be more accurate in describing a broad range of spontaneous, everyday emotions than the more traditional models of discrete stereotypical emotion categories (e.g. happiness, surprise). However, most prior work on estimating valence and arousal considered only laboratory settings and acted data. It is unclear whether the findings of these studies also hold when the methodologies proposed in these works are tested on data collected in-the-wild. In this paper we investigate this. We propose a new dataset of highly accurate per-frame annotations of valence and arousal for 600 challenging video clips extracted from feature films (also used in part for the AFEW dataset). For each video clip, we further provide per-frame annotations of 68 facial landmarks. We subsequently evaluate a number of common baseline and state-of-the-art methods on both a commonly used laboratory recording dataset (Semaine database) and the newly proposed recording set (AFEW-VA). Our results show that geometric features perform well independently of the settings. However, as expected, methods that perform well on constrained data do not necessarily generalise to uncontrolled data and vice-versa. AFEW-VA dataset for continuous valence and arousal estimation in-the-wildReview of existing work and databases for valence and arousal estimation from audio-video cuesBaseline results and comparison of the performance of various featuresComparison of state-of-the art methods on controlled and unconstrained environments

[1]  Catherine Pelachaud,et al.  A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection , 2012, ICMI '12.

[2]  William M. Campbell,et al.  Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction , 2016, AVEC@ACM Multimedia.

[3]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[5]  Stefanos Zafeiriou,et al.  Correlated-spaces regression for learning continuous emotion dimensions , 2013, MM '13.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Ya Li,et al.  Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition , 2015, AVEC@ACM Multimedia.

[8]  Ya Li,et al.  Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video , 2014, AVEC '14.

[9]  Sven Behnke,et al.  PyStruct: learning structured prediction in python , 2014, J. Mach. Learn. Res..

[10]  John Cosmas,et al.  Time-Delay Neural Network for Continuous Emotional Dimension Prediction From Facial Expression Sequences , 2016, IEEE Transactions on Cybernetics.

[11]  Björn W. Schuller,et al.  Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..

[12]  Vladimir Pavlovic,et al.  Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[14]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[15]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[16]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[17]  Albert Ali Salah,et al.  Ensemble CCA for Continuous Emotion Prediction , 2014, AVEC '14.

[18]  K. Kroschel,et al.  Evaluation of natural emotions using self assessment manikins , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[19]  Peter Robinson,et al.  Dimensional affect recognition using Continuous Conditional Random Fields , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Vladimir Pavlovic,et al.  Context-Sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Maja Pantic,et al.  Fast and exact bi-directional fitting of active appearance models , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[24]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[25]  Xiaogang Wang,et al.  Hybrid Deep Learning for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[29]  Vladimir Pavlovic,et al.  Dynamic Probabilistic CCA for Analysis of Affective Behaviour , 2012, ECCV.

[30]  Ting Dang,et al.  An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction , 2015, AVEC@ACM Multimedia.

[31]  Enrique Argones-Rúa,et al.  Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex , 2013, AVEC@ACM Multimedia.

[32]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[33]  Patrick Thiam,et al.  Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges , 2015, AVEC@ACM Multimedia.

[34]  Margaret McRorie,et al.  The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.

[35]  Maja Pantic,et al.  Fast Newton active appearance models , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[36]  Maja Pantic,et al.  Doubly Sparse Relevance Vector Machine for Continuous Facial Behavior Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[38]  Hatice Gunes,et al.  Output-associative RVM regression for dimensional and continuous emotion prediction , 2011, Face and Gesture 2011.

[39]  Maja Pantic,et al.  Meta-Analysis of the First Facial Expression Recognition Challenge , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Stefanos Zafeiriou,et al.  Robust Correlated and Individual Component Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Vladimir Pavlovic,et al.  Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[43]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[46]  Qin Jin,et al.  Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[47]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[48]  Maja Pantic,et al.  Machine analysis of facial behaviour: naturalistic and dynamic behaviour , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[49]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[50]  B. Heisele Face Detection , 2001 .

[51]  Patrick Cardinal,et al.  ETS System for AV+EC 2015 Challenge , 2015, AVEC@ACM Multimedia.

[52]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[53]  Maja Pantic,et al.  Latent trees for estimating intensity of Facial Action Units , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Maja Pantic,et al.  Continuous Pain Intensity Estimation from Facial Expressions , 2012, ISVC.

[55]  Yang Li,et al.  3D model-based continuous emotion recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[57]  Maja Pantic,et al.  Fast and Exact Newton and Bidirectional Fitting of Active Appearance Models , 2017, IEEE Transactions on Image Processing.

[58]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[59]  Daniel McDuff,et al.  Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected "In-the-Wild" , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[60]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[61]  Lionel Prevost,et al.  Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[62]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[63]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[64]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[65]  Bo Sun,et al.  Exploring Multimodal Visual Features for Continuous Affect Recognition , 2016, AVEC@ACM Multimedia.

[66]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[67]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[68]  N. Ahmed,et al.  Discrete Cosine Transform , 2019, IEEE Transactions on Computers.

[69]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[70]  Kostas Karpouzis,et al.  The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.

[71]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[72]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[73]  Alessandro Rozza,et al.  Multimodal Affective Analysis combining Regularized Linear Regression and Boosted Regression Trees , 2015, AVEC@ACM Multimedia.

[74]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[75]  Rahul Gupta,et al.  Online Affect Tracking with Multimodal Kalman Filters , 2016, AVEC@ACM Multimedia.

[76]  Maja Pantic,et al.  The MAHNOB Laughter database , 2013, Image Vis. Comput..

[77]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Anton Nijholt,et al.  The MAHNOB Mimicry Database: A database of naturalistic human interactions , 2015, Pattern Recognit. Lett..

[79]  Patrick Thiam,et al.  Continuous Multimodal Human Affect Estimation using Echo State Networks , 2016, AVEC@ACM Multimedia.

[80]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[81]  Pavel Matejka,et al.  Multimodal Emotion Recognition for AVEC 2016 Challenge , 2016, AVEC@ACM Multimedia.

[82]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[84]  Etienne B. Roesch,et al.  A Blueprint for Affective Computing: A Sourcebook and Manual , 2010 .

[85]  Björn W. Schuller,et al.  A demonstration of audiovisual sensitive artificial listeners , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.