Technical Report for Valence-Arousal Estimation in ABAW2 Challenge

In this work, we describe our method for tackling the valence-arousal estimation challenge from ABAW2 ICCV2021 Competition. The competition organizers provide an inthe-wild Aff-Wild2 dataset for participants to analyze affective behavior in real-life settings. We use a two stream model to learn emotion features from appearance and action respectively. To solve data imbalanced problem, we apply label distribution smoothing (LDS) to re-weight labels. Our proposed method achieves Concordance Correlation Coefficient (CCC) of 0.591 and 0.617 for valence and arousal on the validation set of Aff-wild2 dataset.

[1]  Guoying Zhao,et al.  Aff-Wild: Valence and Arousal ‘In-the-Wild’ Challenge , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Hao Wang,et al.  Delving into Deep Imbalanced Regression , 2021, ICML.

[3]  Maja Pantic,et al.  AFEW-VA database for valence and arousal estimation in-the-wild , 2017, Image Vis. Comput..

[4]  Byung Cheol Song,et al.  Contrastive Adversarial Learning for Person Independent Facial Emotion Recognition , 2021, AAAI.

[5]  Dimitrios Kollias,et al.  Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace , 2019, BMVC.

[6]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[7]  Georgios Tzimiropoulos,et al.  Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  S. Zafeiriou,et al.  Analysing Affective Behavior in the First ABAW 2020 Competition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[9]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[10]  Dimitrios Kollias,et al.  Face Behavior à la carte: Expressions, Affect and Action Units in a Single Network , 2019, ArXiv.

[11]  Yuqian Zhou,et al.  MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition , 2019, AAAI.

[12]  Wei-Yi Chang,et al.  FATAUVA-Net: An Integrated Deep Learning Framework for Facial Attribute Recognition, Action Unit Detection, and Valence-Arousal Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Guodong Chen,et al.  A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition , 2019, IEEE Access.

[14]  Stefanos Zafeiriou,et al.  Analysing Affective Behavior in the second ABAW2 Competition , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[15]  Bertram E. Shi,et al.  Multitask Emotion Recognition with Incomplete Labels , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[16]  Stefan Wermter,et al.  The OMG-Emotion Behavior Dataset , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[17]  Stefanos Zafeiriou,et al.  Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework , 2021, ArXiv.

[18]  Stefanos Zafeiriou,et al.  Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study , 2021, ArXiv.