Face alignment using a deep neural network with local feature learning and recurrent regression

Abstract We propose a face alignment method that uses a deep neural network employing both local feature learning and recurrent regression. This method is primarily based on a convolutional neural network(CNN), which automatically learns local feature descriptors from the local facial landmark dataset that we created. Our research is motivated by the belief that investigating a face from its low-level component features would produce more competitive face alignment results, just as a CNN is normally trained to automatically learn a feature hierarchy from the lowest to the highest levels of abstraction. Moreover, by separately training the feature extraction layers and the regression layers, we impose an explicit functional discrimination between the feature extraction and regression tasks. First, we train a feature extraction network that is used to classify the landmark patches in the dataset. Using this pre-trained feature extraction network, we build a face alignment network, which uses an entire face image rather than the local landmark patch as input, thus generating the global facial features. The subsequent local feature extraction layer extracts the local feature set from this global feature, finally generating the local feature descriptors, in which space the network learns a generic descent direction from the currently estimated landmark positions to the ground truth via linear regression applied recurrently. Head pose estimation network also applied to provide a good initial estimate to the local feature extraction layer for accurate convergence. We found that learning of the good local landmark features in pursuit of good landmark classification also leads to a higher face alignment accuracy and achieves state-of-the-art performance on several public benchmark dataset. It signifies the importance of learning not only the global features but the local features for face alignment. We further verify our method's effectiveness when applied to related problems such as head pose estimation, facial landmark tracking, and invisible landmark detection. We believe that good local learning enables a deeper understanding of the face or object resulting in higher performance.

[1]  Timothy F. Cootes,et al.  The Use of Active Shape Models for Locating Structures in Medical Images , 1993, IPMI.

[2]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[3]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[7]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Wenhan Luo,et al.  Unified Face Analysis by Iterative Multi-output Random Forests , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Takeshi Saitoh,et al.  Head Pose Estimation Using Convolutional Neural Network , 2018 .

[12]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[17]  Sheng Huang,et al.  Joint Local Regressors Learning for Face Alignment , 2016, Neurocomputing.

[18]  C. R. Rao,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[19]  Xingyu Liu Head pose Estimation Using Convolutional Neural Networks , 2014 .

[20]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[23]  Hanjiang Lai,et al.  Deep Cascaded Regression for Face Alignment , 2015 .

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[26]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[30]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[32]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[33]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[34]  Fernando De la Torre,et al.  Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision , 2014, ArXiv.

[35]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[38]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[39]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Qiang Ji,et al.  Shape Augmented Regression Method for Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).