Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qiang Ji,et al.  Classifier Learning with Prior Probabilities for Facial Action Unit Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Fernando De la Torre,et al.  Modeling Spatial and Temporal Cues for Multi-label Facial Action Unit Detection , 2016, ArXiv.

[5]  Yizhou Yu,et al.  Contrast-Oriented Deep Neural Networks for Salient Object Detection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Qiang Ji,et al.  Capturing Global Semantic Relationships for Facial Action Unit Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Yuan Xie,et al.  Instance-Level Salient Object Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Alexei A. Efros,et al.  A Century of Portraits: A Visual Historical Record of American High School Yearbooks , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[10]  Maja Pantic,et al.  Action unit detection using sparse appearance descriptors in space-time video volumes , 2011, Face and Gesture 2011.

[11]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[12]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ioannis Patras,et al.  Fusing Multilabel Deep Networks for Facial Action Unit Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[15]  Yuan Xie,et al.  Facial Landmark Localization in the Wild by Backbone-Branches Representation Learning , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[16]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[18]  Qiang Ji,et al.  Multiple Facial Action Unit recognition by learning joint features and label relations , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[19]  Maja Pantic,et al.  Fully Automatic Facial Action Unit Detection and Temporal Analysis , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[20]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[21]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Zhigang Zhu,et al.  Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[24]  Sergio Escalera,et al.  Deep Structure Inference Network for Facial Action Unit Recognition , 2018, ECCV.

[25]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[27]  Lijun Yin,et al.  EAC-Net: A Region-Based Deep Enhancing and Cropping Approach for Facial Action Unit Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).