3D human pose estimation from image using couple sparse coding

Recent studies have demonstrated that high-level semantics in data can be captured using sparse representation. In this paper, we propose an approach to human body pose estimation in static images based on sparse representation. Given a visual input, the objective is to estimate 3D human body pose using feature space information and geometrical information of the pose space. On the assumption that each data point and its neighbors are likely to reside on a locally linear patch of the underlying manifold, our method learns the sparse representation of the new input using both feature and pose space information and then estimates the corresponding 3D pose by a linear combination of the bases of the pose dictionary. Two strategies for dictionary construction are presented: (i) constructing the dictionary by randomly selecting the frames of a sequence and (ii) selecting specific frames of a sequence as dictionary atoms. We analyzed the effect of each strategy on the accuracy of pose estimation. Extensive experiments on datasets of various human activities show that our proposed method outperforms state-of-the-art methods.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  Trevor Darrell,et al.  On modelling nonlinear shape-and-texture appearance manifolds , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[4]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[5]  Yi Yang,et al.  3D human pose recovery from image by efficient visual feature selection , 2011, Comput. Vis. Image Underst..

[6]  Stefanos Zafeiriou,et al.  Sparse representations of image gradient orientations for visual recognition and tracking , 2011, CVPR 2011 WORKSHOPS.

[7]  Hamid R. Rabiee,et al.  Graph based semi-supervised human pose estimation: When the output space comes to help , 2012, Pattern Recognit. Lett..

[8]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Ming-Hsuan Yang,et al.  Fast sparse representation with prototypes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[12]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[13]  Kota Hara,et al.  Human pose estimation using patch-based candidate generation and model-based verification , 2011, Face and Gesture 2011.

[14]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[15]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[16]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[17]  Shuyuan Yang,et al.  Multitask dictionary learning and sparse representation based single-image super-resolution reconstruction , 2011, Neurocomputing.

[18]  Hao Jiang 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[20]  Rajesh P. N. Rao,et al.  Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[21]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[22]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[24]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[25]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[26]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[30]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[32]  Ming-Hsuan Yang,et al.  Estimating Human Pose from Occluded Images , 2009, ACCV.

[33]  Tomaso Poggio,et al.  Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines , 2006 .

[34]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[36]  Ahmed M. Elgammal,et al.  Modeling View and Posture Manifolds for Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Li Shang,et al.  Super-Resolution Restoration of MMW Image Using Sparse Representation Based on Couple Dictionaries , 2012, ICIC.

[38]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[39]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..