Training-based head pose estimation under monocular vision

Although many 3D head pose estimation methods based on monocular vision can achieve an accuracy of 5°, how to reduce the number of required training samples and how to not to use any hardware parameters as input features are still among the biggest challenges in the field of head pose estimation. To aim at these challenges, the authors propose an accurate head pose estimation method which can act as an extension to facial key point detection systems. The basic idea is to use the normalised distance between key points as input features, and to use l1-minimisation to select a set of sparse training samples which can reflect the mapping relationship between the feature vector space and the head pose space. The linear combination of the head poses corresponding to these samples represents the head pose of the test sample. The experiment results show that the authors’ method can achieve an accuracy of 2.6° without any extra hardware parameters or information of the subject. In addition, in the case of large head movement and varying illumination, the authors’ method is still able to estimate the head pose.

[1]  Myung Jin Chung,et al.  3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid model , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Jian-Gang Wang,et al.  EM enhancement of 3D head pose estimated by point at infinity , 2007, Image Vis. Comput..

[3]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  Mohan M. Trivedi,et al.  A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[6]  Horst Bischof,et al.  Supervised local subspace learning for continuous head pose estimation , 2011, CVPR 2011.

[7]  Norbert Krüger,et al.  Determination of face position and pose with a learned representation based on labelled graphs , 1997, Image Vis. Comput..

[8]  Shaogang Gong,et al.  Multi-view face detection and pose estimation using a composite support vector machine across the view sphere , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[9]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[10]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[12]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Michael G. Strintzis,et al.  Robust real-time 3D head pose estimation from range data , 2005, Pattern Recognit..

[14]  Lisa M. Brown,et al.  Comparative study of coarse head pose estimation , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[15]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Shaogang Gong,et al.  Composite support vector machines for detection of faces across views and pose estimation , 2002, Image Vis. Comput..

[17]  Roberto Cipolla,et al.  Determining the gaze of faces in images , 1994, Image Vis. Comput..

[18]  Mohan M. Trivedi,et al.  Robust real-time detection, tracking, and pose estimation of faces in video streams , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jing Xiao,et al.  Robust full‐motion recovery of head by dynamic templates and re‐registration techniques , 2003 .

[21]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[22]  Takeo Kanade,et al.  Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models , 2007, International Journal of Computer Vision.

[23]  Yuxiao Hu,et al.  Estimating face pose by facial asymmetry and geometry , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[24]  Zihan Zhou,et al.  Demo: Robust face recognition via sparse representation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.