Articulated Pose Estimation Using Hierarchical Exemplar-Based Models

Exemplar-based models have achieved great success on localizing the parts of semi-rigid objects. However, their efficacy on highly articulated objects such as humans is yet to be explored. Inspired by hierarchical object representation and recent application of Deep Convolutional Neural Networks (DCNNs) on human pose estimation, we propose a novel formulation that incorporates both hierarchical exemplar-based models and DCNNs in the spatial terms. Specifically, we obtain more expressive spatial models by assuming independence between exemplars at different levels in the hierarchy; we also obtain stronger spatial constraints by inferring the spatial relations between parts at the same level. As our method strikes a good balance between expressiveness and strength of spatial models, it is both effective and generalizable, achieving state-of-the-art results on different benchmarks: Leeds Sports Dataset and CUB-200-2011.

[1]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[2]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Jonathan Tompson,et al.  Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[4]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[5]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[6]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[8]  Peter N. Belhumeur,et al.  Part-Pair Representation for Part Localization , 2014, ECCV.

[9]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Peter V. Gehler,et al.  Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[12]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[13]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Satoshi Ito,et al.  Co-occurrence Histograms of Oriented Gradients for Human Detection , 2010 .

[15]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[16]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[17]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Yi Li,et al.  Learning Visual Symbols for Parsing Human Poses in Images , 2013, IJCAI.

[21]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[22]  Pietro Perona,et al.  Strong supervision from weak annotation: Interactive training of deformable part models , 2011, 2011 International Conference on Computer Vision.

[23]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[24]  Song-Chun Zhu,et al.  Integrating Grammar and Segmentation for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[26]  Peter N. Belhumeur,et al.  Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[28]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[29]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.