Dilated Skip Convolution for Facial Landmark Detection

Facial landmark detection has gained enormous interest for face-related applications due to its success in facial analysis tasks such as facial recognition, cartoon generation, face tracking and facial expression analysis. Many studies have been proposed and implemented to deal with the challenging problems of localizing facial landmarks from given images, including large appearance variations and partial occlusion. Studies have differed in the way they use the facial appearances and shape information of input images. In our work, we consider facial information within both global and local contexts. We aim to obtain local pixel-level accuracy for local-context information in the first stage and integrate this with knowledge of spatial relationships between each key point in a whole image for global-context information in the second stage. Thus, the pipeline of our architecture consists of two main components: (1) a deep network for local-context subnet that generates detection heatmaps via fully convolutional DenseNets with additional kernel convolution filters and (2) a dilated skip convolution subnet—a combination of dilated convolutions and skip-connections networks—that are in charge of robustly refining the local appearance heatmaps. Through this proposed architecture, we demonstrate that our approach achieves state-of-the-art performance on challenging datasets—including LFPW, HELEN, 300W and AFLW2000-3D—by leveraging fully convolutional DenseNets, skip-connections and dilated convolution architecture without further post-processing.

[1]  Hanjiang Lai,et al.  Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[2]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Junjie Yan,et al.  Learn to Combine Multiple Hypotheses for Accurate Face Alignment , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[10]  Yudong Zhang,et al.  Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm , 2018, Neurocomputing.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Baoqun Yin,et al.  Skip-connection convolutional neural network for still image crowd counting , 2018, Applied Intelligence.

[13]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[14]  Qiang Ji,et al.  Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Qiang Ji,et al.  Facial Landmark Detection: A Literature Survey , 2018, International Journal of Computer Vision.

[16]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[17]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Yorgos Tzimiropoulos,et al.  Bulat , Adrian and Tzimiropoulos , Georgios ( 2016 ) Convolutional aggregation of local evidence for large pose face alignment , 2017 .

[19]  Zengfu Wang,et al.  Improved Stacked Hourglass Network with Offset Learning for Robust Facial Landmark Detection , 2019, 2019 9th International Conference on Information Science and Technology (ICIST).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Huimin Lu,et al.  Facial Emotion Recognition Based on Biorthogonal Wavelet Entropy, Fuzzy Support Vector Machine, and Stratified Cross Validation , 2016, IEEE Access.

[24]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[25]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[26]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[27]  Hanjiang Lai,et al.  Deep Recurrent Regression for Facial Landmark Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Gerhard Rigoll,et al.  Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[31]  Zhenan Sun,et al.  Combining Data-Driven and Model-Driven Methods for Robust Facial Landmark Detection , 2016, IEEE Transactions on Information Forensics and Security.

[32]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Wenyan Wu,et al.  Leveraging Intra and Inter-Dataset Variations for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Marios Savvides,et al.  Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Qingshan Liu,et al.  Stacked Hourglass Network for Robust Facial Landmark Localisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[39]  Andrew Zisserman,et al.  Recurrent Human Pose Estimation , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[40]  Xi Chen,et al.  Delving Deep Into Coarse-to-Fine Framework for Facial Landmark Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Massimo Mauro,et al.  Face analysis through semantic face segmentation , 2019, Signal Process. Image Commun..

[42]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[43]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[45]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[46]  Ioannis A. Kakadiaris,et al.  Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[47]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  William J. Christmas,et al.  Gaussian mixture 3D morphable face model , 2018, Pattern Recognit..

[50]  Zhengyang Wang,et al.  Smoothed dilated convolutions for improved dense prediction , 2018, Data Mining and Knowledge Discovery.

[51]  Massimo Mauro,et al.  FASSEG: A FAce semantic SEGmentation repository for face image analysis , 2019, Data in brief.

[52]  Xiaoming Liu,et al.  Dense Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[53]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[54]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.

[56]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[57]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[58]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[60]  Marek Kowalski,et al.  Deep Alignment Network: A Convolutional Neural Network for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[61]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.