Multi-Scale Region with Local Relationship Learning for Facial Action Unit Detection

Recently, an effective method to solve the problem of complex facial expression recognition is to encode individual facial expressions through the action units (AUs) encoded by the Facial Action Coding System (FACS). A large number of methods have been proposed for AU detection, but the problem of AU detection is still very challenging due to the different sizes and shapes of facial AU. Therefore, to solve the problem of locating AU, many methods first detect the landmark of the face and then detect AU according to the location of the landmark. However, in this way, the accuracy of landmark detection greatly affects the detection result of AU, making this method less robust. In this paper, we use a multi-scale structure (MTL) to solve the problem of facial AU region distribution size varies while being independent of the facial landmark information. Besides, due to the strong correlation between facial AU regions, we propose the structure of region relationship learning (LRT) using rich local information to learn the relationship between facial local regions. The end-to-end Multi-Scale Region with Local Relationship Learning (MTLRT-Net) we proposed is a lite network with low hardware requirements, and extensive experiments on the BP4D and DISFA demonstrate that our network framework is better than the state-of-the-art methods.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[3]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[6]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Qingshan Liu,et al.  Learning Multiscale Active Facial Patches for Expression Analysis , 2015, IEEE Transactions on Cybernetics.

[11]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zhang Xiong,et al.  Confidence Preserving Machine for Facial Action Unit Detection , 2015, IEEE Transactions on Image Processing.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Honggang Zhang,et al.  Joint Patch and Multi-label Learning for Facial Action Unit and Holistic Expression Recognition , 2016, IEEE Transactions on Image Processing.

[16]  Yan Wang,et al.  Recognition of Action Units in the Wild with Deep Nets and a New Global-Local Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Sergio Escalera,et al.  Deep Structure Inference Network for Facial Action Unit Recognition , 2018, ECCV.

[19]  Jianfei Cai,et al.  Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment , 2018, ECCV.

[20]  Lijun Yin,et al.  EAC-Net: Deep Nets with Enhancing and Cropping for Facial Action Unit Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Srirangaraj Setlur,et al.  Representation Learning Through Cross-Modality Supervision , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[22]  Shiguang Shan,et al.  Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jianfei Cai,et al.  Facial Action Unit Detection Using Attention and Relation Learning , 2018, IEEE Transactions on Affective Computing.