Locating Nose-Tips and Estimating Head Poses in Images by Tensorposes

This paper introduces a head pose estimation system that automatically localizes the nose-tips of the faces and estimates head poses in images simultaneously. In the training stage, the nose-tips of the faces are first manually labeled. The appearance variations caused by head pose changes are then characterized by a tensorposes model. Given an image with unknown head pose and nose-tip location, the nose-tip of the face is automatically localized in a coarse-to-fine fashion after the skin color segmentation. The head pose is also estimated simultaneously. The performance of our system is evaluated on the Pointing'04 head pose image data set. We first evaluate the classification performance of the tensorposes models with image patches of the faces cropped according to the manually labeled nose-tip locations of the faces in the Pointing '04 data set. By leaving-one-person-out evaluation strategy, we obtain the optimal parameters of the Tensorposes model, and evaluate the discriminative power of the tensorposes model built based on high order singular value decomposition (HOSVD) and multilinear independent component analysis (MICA), and naive principal component analysis (PCA) subspace models. It is shown Tensorposes model by HOSVD and MICA decomposition performs similarly good but much better than naive PCA subspace models. The tensorposes model is then utilized to automatically localize nose-tip location in the testing image and to simultaneously estimate the head pose. The nose-tip localization and pose estimation accuracy of the proposed system are evaluated against the ground truth. Finally, cross-database evaluation of the performance of our system is carried out on Pointing'04 database, a selected subset of CMU PIE database, and some pictures from CLEAR'07 head pose evaluation database. The experiments show that our system generalizes reasonably well to the real-world scenarios.

[1]  Yun Fu,et al.  Graph embedded analysis for head pose estimation , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[2]  Demetri Terzopoulos,et al.  Multilinear independent components analysis , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Peter Eisert,et al.  Model-aided coding: a new approach to incorporate facial animation into motion-compensated video coding , 2000, IEEE Trans. Circuits Syst. Video Technol..

[4]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[5]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[6]  Yuxiao Hu,et al.  Head pose estimation using Fisher Manifold learning , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[7]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[8]  Yuxiao Hu,et al.  Evaluation of Head Pose Estimation for Studio Data , 2006, CLEAR.

[9]  Dinh Tuan Pham,et al.  Separation of a mixture of independent sources through a maximum likelihood approach , 1992 .

[10]  Demetri Terzopoulos,et al.  Multilinear image analysis for facial recognition , 2002, Object recognition supported by user interaction for service robots.

[11]  James L. Crowley,et al.  Head Pose Estimation on Low Resolution Images , 2006, CLEAR.

[12]  Dong Xu,et al.  Coupled kernel-based subspace learning , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[14]  Dmitry O. Gorodnichy,et al.  On importance of nose for face tracking , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[15]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  R. Stiefelhagen Estimating Head Pose with Neural Networks-Results on the Pointing 04 ICPR Workshop Evaluation Data , 2004 .

[17]  Daijin Kim,et al.  Real-Time Facial Pose Identification With Hierarchically Structured Ml Pose Classifier , 2004, Int. J. Pattern Recognit. Artif. Intell..

[18]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Larry S. Davis,et al.  Model-Based Object Pose in 25 Lines of Code , 1992, ECCV.

[20]  Marios Savvides,et al.  Estimating Mixing Factors Simultaneously in Multilinear Tensor Decomposition for Robust Face Recognition and Synthesis , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[21]  Philippe Garat,et al.  Blind separation of mixture of independent sources through a quasi-maximum likelihood approach , 1997, IEEE Trans. Signal Process..

[22]  Tieniu Tan,et al.  Head pose localization based on multicue fusion* , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[23]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[24]  Xuelong Li,et al.  Human Carrying Status in Visual Surveillance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Lisa M. Brown,et al.  Comparative study of coarse head pose estimation , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[27]  Jiawei Han,et al.  Tensor space model for document analysis , 2006, SIGIR.

[28]  M. Trivedi,et al.  A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis , 2004 .

[29]  Vincent Lepetit,et al.  Fusing online and offline information for stable 3D tracking in real-time , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[31]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[32]  Yuxiao Hu,et al.  Learning a Person-Independent Representation for Precise 3D Pose Estimation , 2007, CLEAR.

[33]  Jean-Marc Odobez,et al.  Tracking the multi person wandering visual focus of attention , 2006, ICMI '06.

[34]  Matthew Brand,et al.  Flexible flow for 3D nonrigid tracking and shape recovery , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Dit-Yan Yeung,et al.  Tensor Embedding Methods , 2006, AAAI.

[36]  Markus Kampmann Automatic 3-D face model adaptation for model-based coding of videophone sequences , 2002, IEEE Trans. Circuits Syst. Video Technol..

[37]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[38]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[39]  Thomas S. Huang,et al.  Locating Nosetips and Estimating Head Pose in Images by Tensorposes , 2007, 2007 IEEE International Conference on Image Processing.

[40]  Surendra Ranganath,et al.  Robust Attentive Behavior Detection by Non-linear Head Pose Embedding and Estimation , 2006, ECCV.

[41]  Ahmed M. Elgammal,et al.  Homeomorphic Manifold Analysis: Learning Decomposable Generative Models for Human Motion Analysis , 2006, WDV.

[42]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Gregory D. Hager,et al.  A Particle Filter without Dynamics for Robust 3D Face Tracking , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[44]  Stephen Lin,et al.  Rank-one Projections with Adaptive Margins for Face Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Rainer Stiefelhagen,et al.  Neural Network-based Head Pose Estimation and Multiview Fusion – Draft Version – , 2006 .

[46]  Yuxiao Hu,et al.  Estimating face pose by facial asymmetry and geometry , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[47]  Xuelong Li,et al.  Supervised Tensor Learning , 2005, ICDM.

[48]  Alice J. O'Toole,et al.  FRVT 2006 and ICE 2006 large-scale results , 2007 .

[49]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[50]  Rainer Stiefelhagen,et al.  Neural Network-Based Head Pose Estimation and Multi-view Fusion , 2006, CLEAR.

[51]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[52]  Yun Fu,et al.  Image Classification Using Correlation Tensor Analysis , 2008, IEEE Transactions on Image Processing.