Sieving Regression Forest Votes for Facial Feature Detection in the Wild

In this paper we propose a method for the localization of multiple facial features on challenging face images. In the regression forests (RF) framework, observations (patches) that are extracted at several image locations cast votes for the localization of several facial features. In order to filter out votes that are not relevant, we pass them through two types of sieves, that are organised in a cascade, and which enforce geometric constraints. The first sieve filters out votes that are not consistent with a hypothesis for the location of the face center. Several sieves of the second type, one associated with each individual facial point, filter out distant votes. We propose a method that adjusts on-the-fly the proximity threshold of each second type sieve by applying a classifier which, based on middle-level features extracted from voting maps for the facial feature in question, makes a sequence of decisions on whether the threshold should be reduced or not. We validate our proposed method on two challenging datasets with images collected from the Internet in which we obtain state of the art results without resorting to explicit facial shape models. We also show the benefits of our method for proximity threshold adjustment especially on 'difficult' face images.

[1]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Claudia Lindner,et al.  Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting. , 2015, IEEE transactions on pattern analysis and machine intelligence.

[4]  Shaogang Gong,et al.  Video Synopsis by Heterogeneous Multi-source Correlation , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Edwin R. Hancock,et al.  Coupled Prediction Classification for Robust Visual Tracking , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[7]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[8]  Heng Yang,et al.  Privileged information-based conditional regression forest for facial feature detection , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Maja Pantic,et al.  Facial point detection using boosted regression and graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[14]  Thomas Vetter,et al.  Optimal landmark detection using shape models and branch and bound , 2011, 2011 International Conference on Computer Vision.

[15]  Ioannis A. Kakadiaris,et al.  Facial landmark detection in uncontrolled conditions , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[16]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dahua Lin,et al.  Quality-Driven Face Occlusion Detection and Recovery , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[19]  Luc Van Gool,et al.  Latent Hough Transform for Object Detection , 2012, ECCV.

[20]  Maja Pantic,et al.  Coupled Gaussian processes for pose-invariant facial expression recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[23]  Peter Kontschieder,et al.  Evolutionary Hough Games for coherent object detection , 2012, Comput. Vis. Image Underst..

[24]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Ioannis Patras,et al.  Face Parts Localization Using Structured-Output Regression Forests , 2012, ACCV.

[26]  Timothy F. Cootes,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[27]  Maja Pantic,et al.  Fully automatic facial feature point detection using Gabor feature based boosted classifiers , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[28]  Nick Barnes,et al.  Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.