Sparse-MVRVMs Tree for Fast and Accurate Head Pose Estimation in the Wild

Head pose estimation is an important problem in the field of computer vision and facial analysis. We model the problem of head pose estimation as a regression problem, where the three rotation angles (yaw, pitch, roll) are functions of the face appearance. We make use of that fact and learn the appearance of the face using a tree cascade of sparse Multi-Variate Relevance Vector Machines (MVRVM). Our method is fast and suitable for real-time applications as it is not computationally expensive. Our method learns the face appearance to estimate the head rotation angles. We evaluated our approach on two challenging datasets, the YouTube Faces and the Point and Shoot Challenging (PaSC) dataset. We achieved results of head pose estimation (yaw, pitch, roll) with mean error less than 5\(\circ \) and with error tolerance less than ±4 on the PaSC dataset. In terms of speed, one prediction takes around 6 milliseconds, which is suitable for real-time applications and also with high frame rate.

[1]  Takeo Kanade,et al.  3D Alignment of Face in a Single Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[3]  Stefanos Kollias,et al.  A natural head pose and eye gaze dataset , 2009, AFFINE '09.

[4]  Didier Stricker,et al.  Real-Time Head Pose Estimation Using Multi-variate RVM on Faces in the Wild , 2015, CAIP.

[5]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[6]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[7]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[8]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[9]  David Abend,et al.  1994 Ieee Computer Society Conference On Computer Vision And Pattern Recognition Proceedings June 21 23 1994 Seattle Washington , 1994 .

[10]  Kenneth P. Camilleri,et al.  Model-Free Head Pose Estimation Based on Shape Factorisation and Particle Filtering , 2015, CAIP.

[11]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Moni Naor,et al.  Computer Analysis of Images and Patterns , 1989, Lecture Notes in Computer Science.

[14]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[16]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  K. Walker,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[18]  Bruce A. Draper,et al.  The challenge of face recognition from digital point-and-shoot cameras , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[19]  Hyeonjoon Moon,et al.  The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Björn Stenger,et al.  Multivariate Relevance Vector Machines for Tracking , 2006, ECCV.

[21]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.