Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition

Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion), can be used for this purpose. Recent progress in the emotion recognition domain has been achieved through the development of deep neural architectures and the availability of very large training databases. To this end, Aff-Wild has been the first large-scale "in-the-wild" database, containing around 1,200,000 frames. In this paper, we build upon this database, extending it with 260 more subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the additional data, Aff-Wild2. The videos are downloaded from Youtube and have large variations in pose, age, illumination conditions, ethnicity and profession. Both database-specific as well as cross-database experiments are performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA database. The developed deep neural architectures are based on the joint training of state-of-the-art convolutional and recurrent neural networks with attention mechanism; thus exploiting both the invariant properties of convolutional features, while modeling temporal dynamics that arise in human behaviour via the recurrent layers. The obtained results show premise for utilization of the extended Aff-Wild, as well as of the developed deep neural architectures for visual analysis of human behaviour in terms of continuous emotion dimensions.

[1]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[2]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[3]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[4]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[5]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[6]  Dimitrios Kollias,et al.  Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset , 2019, ArXiv.

[7]  Guoying Zhao,et al.  Recognition of Affect in the Wild Using Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9]  Stacy Marsella,et al.  Computationally modeling human emotion , 2014, CACM.

[10]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[11]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[13]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[14]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[16]  Andreas Stafylopatis,et al.  Deep neural architectures for prediction in healthcare , 2017, Complex & Intelligent Systems.

[17]  Margaret McRorie,et al.  The Belfast Induced Natural Emotion Database , 2012, IEEE Transactions on Affective Computing.

[18]  Stefanos Zafeiriou,et al.  A Multi-component CNN-RNN Approach for Dimensional Emotion Recognition in-the-wild , 2018, ArXiv.

[19]  Forrest N. Iandola,et al.  DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.

[20]  Stefanos Zafeiriou,et al.  Generating faces for affect analysis , 2018, ArXiv.

[21]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[22]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[23]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[24]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[25]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Andreas Stafylopatis,et al.  On line emotion detection using retrainable deep neural networks , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[29]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Dirk Heylen,et al.  The Sensitive Artificial Listner: an induction technique for generating emotionally coloured conversation , 2008 .

[32]  J. Russell Evidence of Convergent Validity on the Dimensions of Affect , 1978 .

[33]  Stefanos Zafeiriou,et al.  Training Deep Neural Networks with Different Datasets In-the-wild: The Emotion Recognition Paradigm , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[34]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[35]  Roddy Cowie,et al.  Tracing Emotion: An Overview , 2012, Int. J. Synth. Emot..

[36]  Lijun Yin,et al.  A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[38]  Yan Zhou,et al.  Lie Detection from Speech Analysis Based on K-SVD Deep Belief Network Model , 2015, ICIC.

[39]  Andreas Stafylopatis,et al.  Adaptation and contextualization of deep neural network models , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[40]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[41]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[42]  Stefanos Zafeiriou,et al.  Photorealistic Facial Synthesis in the Dimensional Affect Space , 2018, ECCV Workshops.

[43]  Andreas Stafylopatis,et al.  Interweaving deep learning and semantic techniques for emotion analysis in human-machine interaction , 2015, 2015 10th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP).