Efficient human pose estimation via parsing a tree structure based human model

Human pose estimation is the task of determining the states (location, orientation and scale) of each body part. It is important for many vision understanding applications, e.g. visual interactive gaming, immersive virtual reality, content-based image retrieval, etc. However, it remains a challenging task because of unknown image background, presence of clutter, partial occlusion and especially the high dimensional state space (usually 30+ dimensions). In this paper, we contribute to human pose estimation in two aspects. First, we design two efficient Markov Chain dynamics under the data-driven Markov Chain Monte Carlo (DDMCMC) framework to effectively explore the complex solution space. Second, we parse the tree structure state space into a lexicographic order according to the image observations and body topology, and the optimization process is conducted in this order. This realizes a much more efficient exploration than the sampling based search and exhaustive search, and thus achieves a tremendous speed-up. Experimental results demonstrate the efficiency and effectiveness of the proposed method in estimating various kinds of human poses, even with cluttered background , poor illumination or partial self-occlusion.

[1]  Ramakant Nevatia,et al.  Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[3]  Sangyop Lee Boundary structure of hyperbolic 3-manifolds admitting annular fillings at large distance , 2006 .

[4]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[8]  Larry S. Davis,et al.  Human body pose estimation using silhouette shape analysis , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[9]  Stephen J. McKenna,et al.  Human Pose Estimation Using Partial Configurations and Probabilistic Regions , 2007, International Journal of Computer Vision.

[10]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[11]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[15]  Zhen-Han Tu,et al.  Rigidity of proper holomorphic mappings between equidimensional bounded symmetric domains , 2001 .

[16]  Mun Wai Lee,et al.  A model-based approach for estimating human 3D poses in static images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jianbo Shi,et al.  Bottom-up Recognition and Parsing of the Human Body , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[22]  Gang Hua,et al.  Learning to estimate human pose with data driven belief propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Jiebo Luo,et al.  Body Localization in Still Images Using Hierarchical Models and Hybrid Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[26]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.