Multiple Transfer Learning and Multi-label Balanced Training Strategies for Facial AU Detection In the Wild

This paper1 presents SIAT-NTU solution and results of facial action unit (AU) detection in the EmotiNet Challenge 2020. The task aims to detect 23 AUs from facial images in the wild, and its main difficulties lie in the imbalanced AU distribution and discriminative feature learning. We tackle these difficulties from the following aspects. First, to address the unconstrained heterogeneity of in-the-wild images, we detect and align faces with multi-task convolutional neural networks (MTCNN). Second, by using multiple transfer strategies, we pre-train large CNNs on multiple related datasets, e.g. face recognition datasets and facial expression datasets, and fine-tune them on the EmotiNetdataset. Third, we employ a multi-label balanced sampling strategy and a weighted loss to mitigate the imbalance problem. Last but not the least, to further improve performance, we ensemble multiple models and optimize the thresholds for each AU. Our proposed solution achieves an accuracy of 90.13% and F1 of 44.10% in the final test phase. Our Code is available at: https://github.com/kaiwang960112/enc2020_au_detection

[1]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Maja Pantic,et al.  Fully Automatic Facial Action Unit Detection and Temporal Analysis , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[3]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[4]  Kai Wang,et al.  Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues , 2018, ICMI.

[5]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[7]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[8]  Peter Bull,et al.  Detecting Deception from Emotional and Unemotional Cues , 2009 .

[9]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[11]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Jianfei Yang,et al.  Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression , 2019, ICMI.

[13]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[14]  Lin Xiong,et al.  A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion , 2017, ArXiv.

[15]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[16]  Kai Wang,et al.  Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction , 2018, ICMI.

[17]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[18]  Carlos D. Castillo,et al.  Doing the Best We Can With What We Have: Multi-Label Balancing With Selective Learning for Attribute Prediction , 2018, AAAI.

[19]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[20]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.