Kernel locality-constrained sparse coding for head pose estimation

In many situations, it would be practical for a computer system user interface to have a model of where a person is looking and what the user is paying attention to. In this study, the authors describe a novel feature coding method for head pose estimation. The widely-used sparse coding (SC) method encodes a test sample using a sparse linear combination of training samples. However, it does not consider the underlying structure of the data in the feature space. In contrast, locality-constrained linear coding (LLC) utilises locality constraints to project each input data into its local-coordinate system. Based on the recent success of LLC, the authors introduce locality-constrained sparse coding (LSC) to overcome the limitation of Sparse Coding. The authors also propose kernel locality-constrained sparse coding, which is a non-linear extension of LSC. By using kernel tricks, the authors implicitly map the input data into the kernel feature space associated with the kernel function. In experiments, the proposed algorithm was applied to a head pose estimation application. Experimental results demonstrated the increased effectiveness and robustness of the method.

[1]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[3]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Horst Bischof,et al.  Supervised local subspace learning for continuous head pose estimation , 2011, CVPR 2011.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[8]  Sang-Heon Lee,et al.  Illumination invariant head pose estimation using random forests classifier and binary pattern run length matrix , 2014, Human-centric Computing and Information Sciences.

[9]  Tieniu Tan,et al.  Evaluation framework on translation-invariant representation for cumulative foot pressure image , 2011, 2011 18th IEEE International Conference on Image Processing.

[10]  Xiaoqing Ding,et al.  Person-independent head pose estimation based on random forest regression , 2010, 2010 IEEE International Conference on Image Processing.

[11]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[12]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Yuxiao Hu,et al.  Evaluation of Head Pose Estimation for Studio Data , 2006, CLEAR.

[14]  Yuan Li,et al.  High-Performance Rotation Invariant Multiview Face Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sethuraman Panchanathan,et al.  Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Shaogang Gong,et al.  Support vector regression and classification based multi-view face detection and recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[17]  Lei Zhang,et al.  Multi-label sparse coding for automatic image annotation , 2009, CVPR.

[18]  Roberto Cipolla,et al.  Determining the gaze of faces in images , 1994, Image Vis. Comput..

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Katsuhiko Sakaue,et al.  Head pose estimation by nonlinear manifold learning , 2004, ICPR 2004.

[21]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[22]  Tony Belpaeme,et al.  A study of a retro-projected robotic face and its effectiveness for gaze reading by humans , 2010, HRI 2010.

[23]  Wen Gao,et al.  Baseline Evaluations on the CAS-PEAL-R1 Face Database , 2004, SINOBIOMETRICS.

[24]  Wen Gao,et al.  The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[25]  Junwei Han,et al.  Locality-constrained sparse patch coding for 3D shape retrieval , 2015, Neurocomputing.

[26]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[27]  Timothy F. Cootes,et al.  Automatic interpretation of human faces and hand gestures using flexible models. , 1995 .

[28]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Jian-Gang Wang,et al.  EM enhancement of 3D head pose estimated by point at infinity , 2007, Image Vis. Comput..

[30]  Jian Yao,et al.  Efficient model-based linear head motion recovery from movies , 2004, CVPR 2004.

[31]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[32]  Changyin Sun,et al.  Kernel Low-Rank Representation for face recognition , 2015, Neurocomputing.

[33]  Brian Scassellati,et al.  Are you looking at me? Perception of robot attention is mediated by gaze type and group size , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[34]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[35]  Isabelle Guyon,et al.  CSMMI: Class-Specific Maximization of Mutual Information for Action and Gesture Recognition , 2014, IEEE Transactions on Image Processing.

[36]  James L. Crowley,et al.  Head Pose Estimation on Low Resolution Images , 2006, CLEAR.

[37]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[38]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[39]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[40]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[41]  M.M. Van Hulle,et al.  View-based 3D object recognition with support vector machines , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[42]  Javier R. Movellan,et al.  A discriminative approach to frame-by-frame head pose tracking , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[43]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[45]  Shiguang Shan,et al.  CovGa: A novel descriptor based on symmetry of regions for head pose estimation , 2014, Neurocomputing.