Combining Local and Global Features for 3D Face Tracking

In this paper, we propose to combine local and global features in a carefully designed convolutional neural network for 3D face alignment. We firstly adopt a part heatmap regression network to predict the landmark points on a local granularity by generating a series of heatmaps for each 3D landmark point. To enhance the ability of local feature representation, we incorporate the designed network with a part attention module, which transfers the convolution opereation into a channelwise attention opereation. Additionally, we take all these heatmaps alongside the input image as the input of another shape regression network in order to model the feature representations from local discrete regions to a global semantically continuous space. Extensive experiments on challenging datasets, AFLW2000-3D, 300VW and the Menpo Benchmark, show the effectiveness of both the global consistency and local description in our model, and the proposed algorithm outperforms state-of-the-art baselines.

[1]  Georgios Tzimiropoulos,et al.  Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[3]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[4]  Gang Wang,et al.  Recurrent Attentional Networks for Saliency Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yorgos Tzimiropoulos,et al.  Bulat , Adrian and Tzimiropoulos , Georgios ( 2016 ) Convolutional aggregation of local evidence for large pose face alignment , 2017 .

[6]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[7]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hwann-Tzong Chen,et al.  Self Adversarial Training for Human Pose Estimation , 2017, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[9]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[10]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting , 2017, International Journal of Computer Vision.

[12]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[13]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[15]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  William J. Christmas,et al.  Fitting 3D Morphable Face Models using local features , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[23]  Zhiao Huang,et al.  Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression , 2015, ArXiv.

[24]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Junjie Yan,et al.  Learn to Combine Multiple Hypotheses for Accurate Face Alignment , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[27]  Xiangyu Zhu,et al.  Discriminative 3D morphable model fitting , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[28]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[30]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[31]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  George Trigeorgis,et al.  3D Face Morphable Models "In-the-Wild" , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Qingshan Liu,et al.  M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression , 2016, Image Vis. Comput..

[35]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[36]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Haoqiang Fan,et al.  Approaching human level facial landmark localization by deep learning , 2016, Image Vis. Comput..

[38]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Georgios Tzimiropoulos,et al.  Two-Stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge , 2016, ECCV Workshops.

[40]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Ashraf A. Kassim,et al.  3D-Assisted Coarse-to-Fine Extreme-Pose Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[43]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).