Learning hierarchical poselets for human parsing

We consider the problem of human parsing with part-based models. Most previous work in part-based models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate for human parsing. In this paper, we introduce hierarchical poselets–a new representation for human parsing. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g. torso + left arm). In the extreme case, they can be the whole bodies. We develop a structured model to organize poselets in a hierarchical way and learn the model parameters in a max-margin framework. We demonstrate the superior performance of our proposed approach on two datasets with aggressive pose variations.

[1]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, CVPR 2004.

[2]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[5]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Jianbo Shi,et al.  Bottom-up Recognition and Parsing of the Human Body , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hao Jiang,et al.  Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[11]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[12]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[15]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[17]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[20]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[21]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[22]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[23]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[25]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[27]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[30]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[31]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.