Deep Cascaded Regression for Face Alignment

We propose a novel cascaded regression framework for face alignment based on a deep convolutional neural network (CNN). In most existing cascaded regression methods, the shape-indexed features are either obtained by hand-crafted visual descriptors or by leaning from the shallow models. This setting may be suboptimal for the face alignment task. To solve this problem, we propose an end-to-end CNN architecture to learn highly discriminative shape-indexed features. First, our deep architecture encodes the image into high-level feature maps in the same size of the image via three main operations: convolution, pooling and deconvolution. Then, we propose "Shape-Indexed Pooling" to extract the deep features from these high level descriptors. We refine the shape via sequential regressions by using the deep shape-indexed features, which demonstrates outstanding performance. We also propose to learn the probability mask for each landmark that can be used to choose the initialization from the shape space. Extensive evaluations conducted on several benchmark datasets demonstrate that the proposed deep framework shows significant improvement over the state-of-the-art methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Maja Pantic,et al.  Local Evidence Aggregation for Regression-Based Facial Point Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[6]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Arun Ross,et al.  Automatic facial makeup detection with application in face recognition , 2013, 2013 International Conference on Biometrics (ICB).

[8]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[10]  Donghoon Lee,et al.  Face alignment using cascade Gaussian process regression trees , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Qiang Ji,et al.  Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[15]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[18]  Wenhan Luo,et al.  Unified Face Analysis by Iterative Multi-output Random Forests , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[25]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[27]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[29]  Xuelong Li,et al.  A Review of Active Appearance Models , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[32]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.