Random Forests for Real Time 3D Face Analysis

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

[1]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[2]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Michael G. Strintzis,et al.  Bilinear Models for 3-D Face and Facial Expression Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[4]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[6]  Luc Van Gool,et al.  In-hand scanning with online loop closure , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[7]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[8]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[10]  Jing Xiao,et al.  Multi-View AAM Fitting and Construction , 2008, International Journal of Computer Vision.

[11]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[12]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  David Cristinacce,et al.  Automatic feature localisation with constrained local models , 2008, Pattern Recognit..

[14]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[15]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[16]  Javier R. Movellan,et al.  A discriminative approach to frame-by-frame head pose tracking , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[17]  Maja Pantic,et al.  Facial point detection using boosted regression and graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Sethuraman Panchanathan,et al.  Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Ryuzo Okada,et al.  Discriminative generalized hough transform for object dectection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Ralph Gross,et al.  Generic vs. person specific active appearance models , 2005, Image Vis. Comput..

[22]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Luc Van Gool,et al.  Head Pose Estimation from Passive Stereo Images , 2009, SCIA.

[24]  Tsz-Ho Yu,et al.  A Novel Genetic Algorithm for 3D Facial Landmark Localization , 2008, 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems.

[25]  Ioannis A. Kakadiaris,et al.  Accurate Landmarking of Three-Dimensional Facial Data in the Presence of Facial Expressions and Occlusions Using a Three-Dimensional Statistical Facial Feature Model , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[29]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[30]  Shervin Mehryar,et al.  Automatic landmark detection for 3D face image processing , 2010, IEEE Congress on Evolutionary Computation.

[31]  Antonio Criminisi,et al.  Regression Forests for Efficient Anatomy Detection and Localization in CT Studies , 2010, MCV.

[32]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[33]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[34]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[35]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[36]  Jorge Batista,et al.  Accurate single view model-based head pose estimation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37]  Javier R. Movellan,et al.  Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[38]  Anil K. Jain,et al.  Automatic feature extraction for multiview 3D face recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[39]  Ioannis A. Kakadiaris,et al.  Three-Dimensional Face Recognition in the Presence of Facial Expressions: An Annotated Deformable Model Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[41]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[42]  Luc Van Gool,et al.  Fast 3D Scanning with Automatic Motion Compensation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Chin-Seng Chua,et al.  Facial feature detection and face recognition from 2D and 3D images , 2002, Pattern Recognit. Lett..

[45]  Martin Breidt,et al.  Robust semantic analysis by synthesis of 3D facial motion , 2011, Face and Gesture 2011.

[46]  Andrea Cavallaro,et al.  3-D Face Detection, Landmark Localization, and Registration Using a Point Distribution Model , 2009, IEEE Transactions on Multimedia.

[47]  Sven Behnke,et al.  Feature-based head pose estimation from images , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[48]  Lijun Yin,et al.  Automatic pose estimation of 3D facial models , 2008, 2008 19th International Conference on Pattern Recognition.

[49]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[50]  Jim Austin,et al.  Binary neural network based 3D facial feature localization , 2009, 2009 International Joint Conference on Neural Networks.

[51]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[52]  Patrick J. Flynn,et al.  Multiple Nose Region Matching for 3D Face Recognition under Varying Facial Expression , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[54]  Horst Bischof,et al.  3D-MAM: 3D morphable appearance model for efficient fine head pose estimation from still images , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[55]  Thomas Vetter,et al.  Optimal landmark detection using shape models and branch and bound , 2011, 2011 International Conference on Computer Vision.

[56]  Chin Seng Chua,et al.  Point Signatures: A New Representation for 3D Object Recognition , 1997, International Journal of Computer Vision.

[57]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[58]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[60]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[61]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[62]  Anil K. Jain,et al.  Detection of Anchor Points for 3D Face Veri.cation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[63]  Chitra Dorai,et al.  COSMOS - A Representation Scheme for 3D Free-Form Objects , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Trevor Darrell,et al.  Pose estimation using 3D view-based eigenspaces , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[65]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Luc Van Gool,et al.  A 3-D Audio-Visual Corpus of Affective Communication , 2010, IEEE Transactions on Multimedia.

[68]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[69]  Chi Fang,et al.  Head Pose Estimation Based on Random Forests for Multiclass Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[70]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[71]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[72]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[73]  K. Walker,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[74]  Maurício Pamplona Segundo,et al.  Automatic Face Segmentation and Facial Landmark Detection in Range Images , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[75]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[76]  Yuxiao Hu,et al.  Head pose estimation using Fisher Manifold learning , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[77]  Mohammed Bennamoun,et al.  Automatic 3D Face Detection, Normalization and Recognition , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).