KEPLER: Simultaneous estimation of keypoints and 3D pose of unconstrained faces in a unified framework by learning efficient H-CNN regressors

Abstract Keypoint detection is one of the most important pre-processing steps in tasks such as face modeling, recognition and verification. In this paper, we present an iterative method for Keypoint Estimation and Pose prediction of unconstrained faces by Learning Efficient H-CNN Regressors (KEPLER) for addressing the unconstrained face alignment problem. Recent state-of-the-art methods have shown improvements in facial keypoint detection by employing Convolution Neural Networks (CNNs). Although a simple feed forward neural network can learn the mapping between input and output spaces, it does not learn the inherent structural dependencies that well. We present a novel architecture called H-CNN (Heatmap-CNN) acting on an N-dimensional input image which captures informative structured global and local features and thus favors accurate keypoint detecion in in-the wild face images. H-CNN is jointly trained on the visibility, fiducials and 3D-pose of the face. As the iterations proceed, the error decreases making the gradients small and thus requiring efficient training of deep networks to mitigate this. KEPLER performs global corrections in pose and fiducials for the first four iterations followed by local corrections at a later stage. As a by-product, KEPLER also provides robust estimate of 3D pose (pitch, yaw and roll) of the face. We also show that without using any 3D information, KEPLER outperforms recent state-of-the-art methods for alignment on challenging datasets such as AFW [1] and AFLW [2].

[1]  Stefanos Zafeiriou,et al.  Active Pictorial Structures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  George Trigeorgis,et al.  Adaptive cascaded regression , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[3]  Donghoon Lee,et al.  Face alignment using cascade Gaussian process regression trees , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[13]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[15]  Qiang Ji,et al.  Robust Facial Landmark Detection Under Significant Head Poses and Occlusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Xiaoming Liu,et al.  Pose-Invariant 3D Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[20]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Feng Liu,et al.  Joint Face Alignment and 3D Face Reconstruction , 2016, ECCV.

[22]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  José Miguel Buenaposada,et al.  Head-Pose Estimation In-the-Wild Using a Random Forest , 2016, AMDO.

[25]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[26]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jian Sun,et al.  Face Alignment Via Component-Based Discriminative Search , 2008, ECCV.

[28]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[31]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Shih-Chieh Huang,et al.  Regressive Tree Structured Model for Facial Landmark Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).