GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture. Most existing AU recognition approaches leverage geometry information in a straightforward 2D or 3D manner, which either ignore 3D manifold information or suffer from high computational costs. In this paper, we propose a novel geodesic guided convolution (GeoConv) for AU recognition by embedding 3D manifold information into 2D convolutions. Specifically, the kernel of GeoConv is weighted by our introduced geodesic weights, which are negatively correlated to geodesic distances on a coarsely reconstructed 3D face model. Moreover, based on GeoConv, we further develop an end-to-end trainable framework named GeoCNN for AU recognition. Extensive experiments on BP4D and DISFA benchmarks show that our approach significantly outperforms the state-of-the-art AU recognition methods.

[1]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jianfei Cai,et al.  Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment , 2018, ECCV.

[3]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[4]  Qiang Ji,et al.  Feature and label relation modeling for multiple-facial action unit classification and intensity estimation , 2017, Pattern Recognit..

[5]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[7]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[8]  Lijun Yin,et al.  EAC-Net: Deep Nets with Enhancing and Cropping for Facial Action Unit Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Honggang Zhang,et al.  Deep Region and Multi-label Learning for Facial Action Unit Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Matti Pietikäinen,et al.  CS-3DLBP and geometry based person independent 3D facial action unit detection , 2013, 2013 International Conference on Biometrics (ICB).

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[15]  Jianfei Cai,et al.  JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention , 2020, International Journal of Computer Vision.

[16]  Keenan Crane,et al.  The heat method for distance computation , 2017, Commun. ACM.

[17]  Lijun Yin,et al.  Facial Action Unit Analysis through 3D Point Cloud Neural Networks , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[18]  Maja Pantic,et al.  Fully Automatic Facial Action Unit Detection and Temporal Analysis , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[19]  Shiguang Shan,et al.  Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Wen-Sheng Chu,et al.  Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jianfei Cai,et al.  Conditional Adversarial Synthesis of 3D Facial Action Units , 2018, Neurocomputing.

[22]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[23]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[24]  Sergio Escalera,et al.  Deep Structure Inference Network for Facial Action Unit Recognition , 2018, ECCV.

[25]  Srirangaraj Setlur,et al.  Representation Learning Through Cross-Modality Supervision , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[26]  Nicu Sebe,et al.  FaceCept3D: Real Time 3D Face Tracking and Analysis , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[27]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[28]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[29]  Jianfei Cai,et al.  Facial Motion Prior Networks for Facial Expression Recognition , 2019, 2019 IEEE Visual Communications and Image Processing (VCIP).

[30]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[31]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[32]  Liang Lin,et al.  Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition , 2019, AAAI.

[33]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[35]  Haifeng Hu,et al.  Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition , 2019, Pattern Recognit..

[36]  Ulrich Neumann,et al.  Depth-aware CNN for RGB-D Segmentation , 2018, ECCV.

[37]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[38]  Srinath Sridhar,et al.  Continuous Geodesic Convolutions for Learning on 3D Shapes , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Jianfei Cai,et al.  Facial Action Unit Detection Using Attention and Relation Learning , 2018, IEEE Transactions on Affective Computing.

[40]  Zhongchao Shi,et al.  Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[42]  Juyong Zhang,et al.  CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images. , 2019, IEEE transactions on pattern analysis and machine intelligence.

[43]  Mohammad H. Mahoor,et al.  Task-dependent multi-task multiple kernel learning for facial action unit detection , 2016, Pattern Recognit..

[44]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[45]  Jianfei Cai,et al.  Unconstrained Facial Action Unit Detection via Latent Feature Domain , 2019, IEEE Transactions on Affective Computing.

[46]  Juyong Zhang,et al.  Disentangled Representation Learning for 3D Face Shape , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).