Head Tracking via Invariant Keypoint Learning

Keypoint matching is a standard tool to solve the correspondence problem in vision applications. However, in 3D face tracking, this approach is often deficient because the human face complexities, together with its rich viewpoint, non-rigid expression, and lighting variations in typical applications, can cause many variations impossible to handle by existing keypoint detectors and descriptors. In this paper, we propose a new approach to tailor keypoint matching to track the 3D pose of the user head in a video stream. The core idea is to learn keypoints that are explicitly invariant to these challenging transformations. First, we select keypoints that are stable under randomly drawn small viewpoints, non-rigid deformations and illumination changes. Then, we treat keypoint descriptor learning at different large angles as an incremental scheme to learn discriminative descriptors. At matching time, to reduce the ratio of outlier correspondences, we use secondorder color information to prune keypoints unlikely to lie on the. Moreover, we integrate optical flow correspondences in an adaptive way to remove motion jitter efficiently. Extensive experiments show that the proposed approach can lead to fast, robust, and accurate 3D head tracking results even under very challenging scenarios.

[1]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[2]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  M.M. Trivedi,et al.  HyHOPE: Hybrid Head Orientation and Position Estimation for vision-based driver head tracking , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[4]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[6]  Shaogang Gong,et al.  Fusion of perceptual cues for robust tracking of head pose and position , 2001, Pattern Recognit..

[7]  Yuxiao Hu,et al.  Estimating face pose by facial asymmetry and geometry , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Qiang Wang,et al.  Real Time Feature Based 3-D Deformable Face Tracking , 2008, ECCV.

[11]  Takeo Kanade,et al.  Robust 3D Head Tracking by Online Feature Registration , 2008 .

[12]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Qiang Ji,et al.  3D Face pose estimation and tracking from a monocular camera , 2002, Image Vis. Comput..

[14]  Sanjoy Dasgupta,et al.  Learning the structure of manifolds using random projections , 2007, NIPS.

[15]  K. Sakaue,et al.  Head pose estimation by nonlinear manifold learning , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[17]  Larry S. Davis,et al.  Computing 3-D head orientation from a monocular image sequence , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[18]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[19]  Rainer Stiefelhagen,et al.  Neural Network-Based Head Pose Estimation and Multi-view Fusion , 2006, CLEAR.

[20]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[21]  Ying Wu,et al.  Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[22]  Gérard G. Medioni,et al.  3D face tracking and expression inference from a 2D sequence using manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ehud Rivlin,et al.  Robust 3D Head Tracking Using Camera Pose Estimation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Javier R. Movellan,et al.  Monocular head pose estimation using generalized adaptive view-based appearance model , 2010, Image Vis. Comput..

[25]  Vincent Lepetit,et al.  Point matching as a classification problem for fast and robust object pose estimation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  William T. Freeman,et al.  Example-based head tracking , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[28]  Jean-Michel Morel,et al.  A fully affine invariant image comparison method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Yuxiao Hu,et al.  Head pose estimation using Fisher Manifold learning , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[30]  Vladimir Vezhnevets,et al.  A Survey on Pixel-Based Skin Color Detection Techniques , 2003 .

[31]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[33]  Jian-Gang Wang,et al.  Pose determination of human faces by using vanishing points , 2001, Pattern Recognit..

[34]  Takeo Kanade,et al.  Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models , 2007, International Journal of Computer Vision.

[35]  Matthew Brand,et al.  Flexible flow for 3D nonrigid tracking and shape recovery , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[36]  Marius Malciu,et al.  A robust model-based approach for 3D head tracking in video sequences , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[37]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[38]  Mohan M. Trivedi,et al.  A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[39]  Jing Xiao,et al.  Real-time combined 2D+3D active appearance models , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[40]  Serge J. Belongie,et al.  Re-thinking non-rigid structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  P. Peer,et al.  Human skin color clustering for face detection , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[42]  Ioannis Pitas,et al.  Facial feature extraction and pose determination , 2000, Pattern Recognit..

[43]  Minsu Cho,et al.  Reweighted Random Walks for Graph Matching , 2010, ECCV.

[44]  Trevor Darrell,et al.  Pose estimation using 3D view-based eigenspaces , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[45]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Mohan M. Trivedi,et al.  Robust real-time detection, tracking, and pose estimation of faces in video streams , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[47]  Helge J. Ritter,et al.  Recognition of human head orientation based on artificial neural networks , 1998, IEEE Trans. Neural Networks.

[48]  Jing Xiao,et al.  Robust full-motion recovery of head by dynamic templates and re-registration techniques , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[49]  Jacob Ström Model-Based Real-Time Head Tracking , 2002, EURASIP J. Adv. Signal Process..

[50]  Harry Shum,et al.  Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[52]  Vincent Lepetit,et al.  Accurate Non-Iterative O(n) Solution to the PnP Problem , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  S. Shan,et al.  Sigma Set: A small second order statistical region descriptor , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Ye Zhang,et al.  3D head tracking under partial occlusion , 2002, Pattern Recognit..

[55]  Fadi Dornaika,et al.  Simultaneous Facial Action Tracking and Expression Recognition in the Presence of Head Motion , 2008, International Journal of Computer Vision.