Multiview Facial Landmark Localization in RGB-D Images via Hierarchical Regression With Binary Patterns

In this paper, we propose a real-time system of multiview facial landmark localization in RGB-D images. The facial landmark localization problem is formulated into a regression framework, which estimates both the head pose and the landmark positions. In this framework, we propose a coarse-to-fine approach to handle the high-dimensional regression output. At first, 3-D face position and rotation are estimated from the depth observation via a random regression forest. Afterward, the 3-D pose is refined by fusing the estimation from the RGB observation. Finally, the landmarks are located from the RGB observation with gradient boosted decision trees in a pose conditional model. The benefits of the proposed localization framework are twofold: the pose estimation and landmark localization are solved with hierarchical regression, which is different from previous approaches where the pose and landmark locations are iteratively optimized, which relies heavily on the initial pose estimation; due to the different characters of the RGB and depth cues, they are used for landmark localization at different stages and incorporated in a robust manner. In the experiments, we show that the proposed approach outperforms state-of-the-art algorithms on facial landmark localization with RGB-D input.

[1]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Luc Van Gool,et al.  Real time 3D face alignment with Random Forests-based Active Appearance Models , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[3]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[7]  Timothy F. Cootes,et al.  Boosted Regression Active Shape Models , 2007, BMVC.

[8]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Maja Pantic,et al.  Fully automatic facial feature point detection using Gabor feature based boosted classifiers , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[11]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[14]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[17]  Maja Pantic,et al.  Facial point detection using boosted regression and graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[20]  Timothy F. Cootes,et al.  Accurate Regression Procedures for Active Appearance Models , 2011, BMVC.

[21]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jian Sun,et al.  Face Alignment Via Component-Based Discriminative Search , 2008, ECCV.

[24]  Paul A. Bromiley,et al.  Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael Isard,et al.  Active shape models , 1998 .

[26]  Qingshan Liu,et al.  A Component Based Deformable Model for Generalized Face Alignment , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[28]  Xiaoming Liu,et al.  Generic Face Alignment using Boosted Appearance Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Roland Göcke,et al.  A Nonlinear Discriminative Approach to AAM Fitting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[32]  Jean-Luc Dugelay,et al.  An Efficient LBP-Based Descriptor for Facial Depth Images Applied to Gender Recognition Using RGB-D Face Data , 2012, ACCV Workshops.

[33]  Qiang Wang,et al.  Real Time Feature Based 3-D Deformable Face Tracking , 2008, ECCV.

[34]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[36]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[38]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[39]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).