Prior Aided Streaming Network for Multi-task Affective Analysis

Automatic affective recognition has been an important research topic in the human-computer interaction (HCI) area. With the recent development of deep learning techniques and large-scale in-the-wild annotated datasets, facial emotion analysis is now aimed at challenges in real world settings. In this paper, we introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW2) Competition. In dealing with different emotion representations, including Categorical Expression (EXPR), Action Units (AU), and Valence Arousal (VA), we propose a multitask streaming network by a heuristic that the three representations are intrinsically associated with each other. Besides, we leverage an advanced facial expression embedding model as prior knowledge, which is capable of capturing identity-invariant expression features while preserving the expression similarities, to aid the down-streaming recognition tasks. In order to enhance the generalization ability of our model, we generate reliable pseudo labels for unsupervised training and adopt external datasets for fine-tuning. In the official test of ABAW2 Competition, our method ranks first in the EXPR and AU tracks and second in the VA track. The extensive quantitative evaluations, as well as ablation studies on the Aff-Wild2 dataset, prove the effectiveness of our proposed method.

[1]  Michael J. Lyons,et al.  The Japanese Female Facial Expression (JAFFE) Dataset , 1998 .

[2]  Yu Chen,et al.  Multi-label Relation Modeling in Facial Action Units Detection , 2020, ArXiv.

[3]  S. Zafeiriou,et al.  Face Behavior à la carte: Expressions, Affect and Action Units in a Single Network , 2019, ArXiv.

[4]  Stefanos Zafeiriou,et al.  Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study , 2021, ArXiv.

[5]  Shiguang Shan,et al.  Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism , 2019, IEEE Transactions on Image Processing.

[6]  Juyong Zhang,et al.  Facial Expression Retargeting From Human to Avatar Made Easy , 2020, IEEE Transactions on Visualization and Computer Graphics.

[7]  Jianfei Cai,et al.  Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment , 2018, ECCV.

[8]  Guoying Zhao,et al.  Automatic Micro-Expression Analysis: Open Challenges , 2019, Front. Psychol..

[9]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bertram E. Shi,et al.  Multitask Emotion Recognition with Incomplete Labels , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[12]  Qiang Ji,et al.  Hybrid Message Passing with Performance-Driven Structures for Facial Action Unit Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Dimitrios Kollias,et al.  Analysing Affective Behavior in the First ABAW 2020 Competition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[14]  Rui Xia,et al.  Joint Binary Neural Network for Multi-label Learning with Applications to Emotion Classification , 2018, NLPCC.

[15]  Kentaro Murase,et al.  Action Units Recognition by Pairwise Deep Architecture , 2020, ArXiv.

[16]  Wei-Yi Chang,et al.  FATAUVA-Net: An Integrated Deep Learning Framework for Facial Attribute Recognition, Action Unit Detection, and Valence-Arousal Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Liang Lin,et al.  Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition , 2019, AAAI.

[18]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lijun Yin,et al.  EAC-Net: Deep Nets with Enhancing and Cropping for Facial Action Unit Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lijun Yin,et al.  Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Huang Yan,et al.  Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection , 2019 .

[22]  Hyung-Jeong Yang,et al.  Emotion Recognition with Incomplete Labels Using Modified Multi-task Learning Technique , 2021, ArXiv.

[23]  D. Rubinow,et al.  Impaired recognition of affect in facial expression in depressed patients , 1992, Biological Psychiatry.

[24]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[25]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[26]  Yu Ding,et al.  Learning a Facial Expression Embedding Disentangled from Identity , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xilin Chen,et al.  M3T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild , 2020, ArXiv.

[28]  Marc Mehu,et al.  Emotion categories and dimensions in the facial communication of affect: An integrated approach. , 2015, Emotion.

[29]  J. Russell A circumplex model of affect. , 1980 .

[30]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Yang Liu,et al.  MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices , 2018, CCBR.

[32]  Dimitrios Kollias,et al.  Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace , 2019, BMVC.

[33]  Darshan Gera,et al.  Affect Expression Behaviour Analysis in the Wild using Spatio-Channel Attention and Complementary Context Information , 2020, ArXiv.

[34]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[35]  Sergio Escalera,et al.  Deep Structure Inference Network for Facial Action Unit Recognition , 2018, ECCV.

[36]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Shiguang Shan,et al.  Facial Expression Recognition with Inconsistently Annotated Datasets , 2018, ECCV.

[38]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[39]  P. Ekman Darwin, Deception, and Facial Expression , 2003, Annals of the New York Academy of Sciences.

[40]  Zhigang Zhu,et al.  Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wenming Zheng,et al.  DFEW: A Large-Scale Database for Recognizing Dynamic Facial Expressions in the Wild , 2020, ACM Multimedia.

[42]  Stefanos Zafeiriou,et al.  Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework , 2021, ArXiv.

[43]  Aseem Agarwala,et al.  A Compact Embedding for Facial Expression Similarity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Marie Beurton-Aimar,et al.  Multitask Multi-database Emotion Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[45]  Maja Pantic,et al.  Facial action recognition for facial expression analysis from static face images , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[46]  Shaun J. Canavan,et al.  Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Vladimir Pavlovic,et al.  Deep Structured Learning for Facial Action Unit Intensity Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Guoqiang Xu,et al.  A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition , 2021, ArXiv.

[49]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[50]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[51]  Stefanos Zafeiriou,et al.  Analysing Affective Behavior in the second ABAW2 Competition , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[52]  Xilin Chen,et al.  Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition , 2019, NeurIPS.