Motionlet LLC coding for discriminative human pose estimation

Abstract3D human pose estimation is a challenging but important research topic with abundant applications. As for discriminative human pose estimation, the main goal is to learn a nonlinear mapping from image descriptors to 3D human pose configurations, which is difficult due to the high-dimensionality of human pose space and the multimodality of the distribution. To address these problems, we propose a novel motionlet LLC coding in a discriminative framework. A motionlet consists of training examples covering a local area in terms of image space, pose space and time stream. We first group most informative and helpful training examples into motionlets, then perform LLC Coding to learn the nonlinear mapping and get candidate poses, and finally choose the most appropriate pose as the result estimate. To further eliminate ambiguities and improve robustness, we extend our framework to incorporate multiviews. We conduct qualitative evaluation on our Taichi data set and quantitative evaluation on HumanEva data set, which show that our approach has gained the-state-of-the-art performance and significant improvement against previous approaches.

[1]  Mun Wai Lee,et al.  Human Upper Body Pose Estimation in Static Images , 2004, ECCV.

[2]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Yun Fu,et al.  Temporal-Spatial Local Gaussian Process Experts for Human Pose Estimation , 2009, ACCV.

[4]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Yun Fu,et al.  Human Motion Tracking by Temporal-Spatial Local Gaussian Process Experts , 2011, IEEE Transactions on Image Processing.

[7]  Chun Chen,et al.  Pose Estimation with Motionlet LLC Coding , 2012, PCM.

[8]  Xuelong Li,et al.  Visual-Context Boosting for Eye Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[10]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[11]  Ahmed M. Elgammal,et al.  Nonlinear manifold learning for dynamic shape and dynamic appearance , 2007, Comput. Vis. Image Underst..

[12]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Nicholas R. Howe,et al.  Silhouette lookup for monocular 3D pose tracking , 2007, Image Vis. Comput..

[18]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[19]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[20]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23]  Aphrodite Galata,et al.  Local Gaussian Processes for Pose Recognition from Noisy Inputs , 2010, BMVC.

[24]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[25]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[26]  MengChu Zhou,et al.  Image Ratio Features for Facial Expression Recognition Application , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Kun Duan,et al.  A Multi-layer Composite Model for Human Pose Estimation , 2012, BMVC.

[28]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Adrian Hilton,et al.  Viewpoint invariant exemplar-based 3D human tracking , 2006, Comput. Vis. Image Underst..

[30]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[31]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[32]  Thomas S. Huang,et al.  Discriminative estimation of 3D human pose using Gaussian processes , 2008, 2008 19th International Conference on Pattern Recognition.

[33]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.