Real time head pose estimation with random regression forests

Fast and reliable algorithms for estimating the head pose are essential for many applications and higher-level face analysis tasks. We address the problem of head pose estimation from depth data, which can be captured using the ever more affordable 3D sensing technologies available today. To achieve robustness, we formulate pose estimation as a regression problem. While detecting specific face parts like the nose is sensitive to occlusions, learning the regression on rather generic surface patches requires enormous amount of training data in order to achieve accurate estimates. We propose to use random regression forests for the task at hand, given their capability to handle large training datasets. Moreover, we synthesize a great amount of annotated training data using a statistical model of the human face. In our experiments, we show that our approach can handle real data presenting large pose changes, partial occlusions, and facial expressions, even though it is trained only on synthetic neutral face data. We have thoroughly evaluated our system on a publicly available database on which we achieve state-of-the-art performance without having to resort to the graphics card.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Alexander Zelinsky,et al.  An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[4]  K. Walker,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ruigang Yang,et al.  Model-based head pose tracking with stereovision , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[7]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[8]  Trevor Darrell,et al.  Pose estimation using 3D view-based eigenspaces , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[9]  Yuxiao Hu,et al.  Head pose estimation using Fisher Manifold learning , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[12]  Jian Yao,et al.  Efficient model-based linear head motion recovery from movies , 2004, CVPR 2004.

[13]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[14]  Michael G. Strintzis,et al.  Robust real-time 3D head pose estimation from range data , 2005, Pattern Recognit..

[15]  Mohammed Bennamoun,et al.  Automatic 3D Face Detection, Normalization and Recognition , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[16]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Luc Van Gool,et al.  Fast 3D Scanning with Automatic Motion Compensation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Sethuraman Panchanathan,et al.  Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sven Behnke,et al.  Feature-based head pose estimation from images , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[20]  Jing Xiao,et al.  Multi-View AAM Fitting and Construction , 2008, International Journal of Computer Vision.

[21]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Javier R. Movellan,et al.  A discriminative approach to frame-by-frame head pose tracking , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[23]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[24]  Horst Bischof,et al.  3D-MAM: 3D morphable appearance model for efficient fine head pose estimation from still images , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[25]  Ryuzo Okada,et al.  Discriminative generalized hough transform for object dectection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[28]  Chi Fang,et al.  Head Pose Estimation Based on Random Forests for Multiclass Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Antonio Criminisi,et al.  Regression Forests for Efficient Anatomy Detection and Localization in CT Studies , 2010, MCV.

[30]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.