A Multi-Task Learning & Generation Framework: Valence-Arousal, Action Units & Primary Expressions

Over the past few years many research efforts have been devoted to the field of affect analysis. Various approaches have been proposed for: i) discrete emotion recognition in terms of the primary facial expressions; ii) emotion analysis in terms of facial Action Units (AUs), assuming a fixed expression intensity; iii) dimensional emotion analysis, in terms of valence and arousal (VA). These approaches can only be effective, if they are developed using large, appropriately annotated databases, showing behaviors of people in-the-wild, i.e., in uncontrolled environments. Aff-Wild has been the first, large-scale, in-the-wild database (including around 1,200,000 frames of 300 videos), annotated in terms of VA. In the vast majority of existing emotion databases, their annotation is limited to either primary expressions, or valence-arousal, or action units. In this paper, we first annotate a part (around $234,000$ frames) of the Aff-Wild database in terms of $8$ AUs and another part (around $288,000$ frames) in terms of the $7$ basic emotion categories, so that parts of this database are annotated in terms of VA, as well as AUs, or primary expressions. Then, we set up and tackle multi-task learning for emotion recognition, as well as for facial image generation. Multi-task learning is performed using: i) a deep neural network with shared hidden layers, which learns emotional attributes by exploiting their inter-dependencies; ii) a discriminator of a generative adversarial network (GAN). On the other hand, image generation is implemented through the generator of the GAN. For these two tasks, we carefully design loss functions that fit the examined set-up. Experiments are presented which illustrate the good performance of the proposed approach when applied to the new annotated parts of the Aff-Wild database.

[1]  Dimitrios Kollias,et al.  Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset , 2019, ArXiv.

[2]  Stefanos Zafeiriou,et al.  Deep Neural Network Augmentation: Generating Faces for Affect Analysis , 2018, International Journal of Computer Vision.

[3]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Andreas Stafylopatis,et al.  Adaptation and contextualization of deep neural network models , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[7]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[8]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[9]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[10]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[11]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[12]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[13]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[14]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[15]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[16]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[19]  Stefanos Zafeiriou,et al.  Photorealistic Facial Synthesis in the Dimensional Affect Space , 2018, ECCV Workshops.

[20]  Stefanos Zafeiriou,et al.  Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition , 2018, ArXiv.

[21]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[22]  Andreas Stafylopatis,et al.  Interweaving deep learning and semantic techniques for emotion analysis in human-machine interaction , 2015, 2015 10th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP).

[23]  Stefanos Zafeiriou,et al.  Generating faces for affect analysis , 2018, ArXiv.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[26]  Zhuowen Tu,et al.  Learning Generative Models via Discriminative Approaches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[28]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Stefanos Zafeiriou,et al.  A Multi-component CNN-RNN Approach for Dimensional Emotion Recognition in-the-wild , 2018, ArXiv.

[30]  Dimitrios Kollias,et al.  Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace , 2019, BMVC.

[31]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[32]  Guoying Zhao,et al.  Recognition of Affect in the Wild Using Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Andreas Stafylopatis,et al.  On line emotion detection using retrainable deep neural networks , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[34]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[35]  Andreas Stafylopatis,et al.  Deep neural architectures for prediction in healthcare , 2017, Complex & Intelligent Systems.

[36]  Lijun Yin,et al.  A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37]  Stefanos Zafeiriou,et al.  Training Deep Neural Networks with Different Datasets In-the-wild: The Emotion Recognition Paradigm , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[38]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[39]  J. Russell Evidence of Convergent Validity on the Dimensions of Affect , 1978 .