ORGM: Occlusion Relational Graphical Model for Human Pose Estimation

Articulated human pose estimation from monocular image is a challenging problem in computer vision. Occlusion is a main challenge for human pose estimation, which is largely ignored in popular tree structured models. The tree structured model is simple and convenient for exact inference, but short in modeling the occlusion coherence especially in the case of self-occlusion. We propose an occlusion relational graphical model, which is able to model both self-occlusion and occlusion by the other objects simultaneously. The proposed model can encode the interactions between human body parts and objects, and enables it to learn occlusion coherence from data discriminatively. We evaluate our model on several public benchmarks for human pose estimation, including challenging subsets featuring significant occlusion. The experimental results show that our method is superior to the previous state-of-the-arts, and is robust to occlusion for 2D human pose estimation.

[1]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[2]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[3]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[4]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Deva Ramanan,et al.  Detecting Actions, Poses, and Objects with Relational Phraselets , 2012, ECCV.

[7]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[8]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Roland Göcke,et al.  Monocular Image 3D Human Pose Estimation under Self-Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[11]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Nassir Navab,et al.  Robust Optimization for Deep Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[14]  Steve Marschner,et al.  Caliber: Camera Localization and Calibration Using Rigidity Constraints , 2016, International Journal of Computer Vision.

[15]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Vittorio Ferrari,et al.  Appearance Sharing for Collective Human Pose Estimation , 2012, ACCV.

[17]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Kaiqi Huang,et al.  GRMA: Generalized Range Move Algorithms for the Efficient Optimization of MRFs , 2017, International Journal of Computer Vision.

[19]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[20]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[22]  Kaiqi Huang,et al.  Beyond Tree Structure Models: A New Occlusion Aware Graphical Model for Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Deva Ramanan,et al.  Dual coordinate solvers for large-scale structural SVMs , 2013, ArXiv.

[24]  Silvio Savarese,et al.  An efficient branch-and-bound algorithm for optimal human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[27]  Deva Ramanan,et al.  Part-Based Models for Finding People and Estimating Their Pose , 2011, Visual Analysis of Humans.

[28]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Alan L. Yuille,et al.  Parsing occluded people by flexible compositions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaiqi Huang,et al.  Context aware model for articulated human pose estimation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[33]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[34]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[36]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[37]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[38]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[39]  Yi Yang,et al.  Parsing Occluded People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Song-Chun Zhu,et al.  Integrating Grammar and Segmentation for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Sekhar Tatikonda,et al.  Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[43]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[44]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[45]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[47]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[49]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[50]  Charless C. Fowlkes,et al.  Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[54]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[56]  Peter V. Gehler,et al.  Human Pose Estimation with Fields of Parts , 2014, ECCV.

[57]  Ganesh Sundaramoorthi,et al.  Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Daphne Koller,et al.  A segmentation-aware object detection model with occlusion handling , 2011, CVPR 2011.

[60]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[61]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[62]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[63]  Jitendra Malik,et al.  Using k-Poselets for Detecting People and Localizing Their Keypoints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[65]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[67]  Peter V. Gehler,et al.  Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[68]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).