Beyond Physical Connections: Tree Models in Human Pose Estimation

Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[3]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[7]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[10]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[11]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[12]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[13]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[14]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[15]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[16]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[17]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[18]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.