L2, 1-based regression and prediction accumulation across views for robust facial landmark detection

We propose a new methodology for facial landmark detection. Similar to other state-of-the-art methods, we rely on the use of cascaded regression to perform inference, and we use a feature representation that results from concatenating 66 HOG descriptors, one per landmark. However, we propose a novel regression method that substitutes the commonly used Least Squares regressor. This new method makes use of the L2,1 norm, and it is designed to increase the robustness of the regressor to poor initialisations (e.g., due to large out of plane head poses) or partial occlusions. Furthermore, we propose to use multiple initialisations, consisting of both spatial translation and 4 head poses corresponding to different pan rotations. These estimates are aggregated into a single prediction in a robust manner. Both strategies are designed to improve the convergence behaviour of the algorithm, so that it can cope with the challenges of in-the-wild data. We further detail some important experimental details, and show extensive performance comparisons highlighting the performance improvement attained by the method proposed here. Facial landmark detection using a cascade of regressorsNew regression model based on L21 normMultiple initialisations are used to improve robustness.The method is evaluated on multiple datasets and on the 300W challenge.

[1]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  B. Heisele Face Detection , 2001 .

[5]  Maja Pantic,et al.  Optimization Problems for Fast AAM Fitting in-the-Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Maja Pantic,et al.  Empirical analysis of cascade deformable models for multi-view face detection , 2013, Image Vis. Comput..

[7]  Junjie Yan,et al.  Learn to Combine Multiple Hypotheses for Accurate Face Alignment , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[10]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Claudia Lindner,et al.  Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting. , 2015, IEEE transactions on pattern analysis and machine intelligence.

[17]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[23]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[24]  Shuicheng Yan,et al.  Towards Multi-view and Partially-Occluded Face Alignment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[26]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Maja Pantic,et al.  Local Evidence Aggregation for Regression-Based Facial Point Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Maja Pantic,et al.  Facial point detection using boosted regression and graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Timothy F. Cootes,et al.  Training Models of Shape from Sets of Examples , 1992, BMVC.

[32]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.