Articulated pose estimation with flexible mixtures-of-parts

We describe a method for human pose estimation in static images based on a novel representation of part models. Notably, we do not use articulated limb parts, but rather capture orientation with a mixture of templates for each part. We describe a general, flexible mixture model for capturing contextual co-occurrence relations between parts, augmenting standard spring models that encode spatial relations. We show that such relations can capture notions of local rigidity. When co-occurrence and spatial relations are tree-structured, our model can be efficiently optimized with dynamic programming. We present experimental results on standard benchmarks for pose estimation that indicate our approach is the state-of-the-art system for pose estimation, outperforming past work by 50% while being orders of magnitude faster.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  J. O'Rourke,et al.  Model-based image analysis of human motion using constraint propagation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[5]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[6]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[7]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[8]  Christopher Rasmussen,et al.  Grouping dominant orientations for ill-structured road following , 2004, CVPR 2004.

[9]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[11]  M. Lee,et al.  Proposal maps driven MCMC for estimating human body pose in static images , 2004, CVPR 2004.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[18]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jianbo Shi,et al.  Bottom-up Recognition and Parsing of the Human Body , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[21]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[23]  Mark Everingham,et al.  Combining discriminative appearance and segmentation cues for articulated human pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[24]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[25]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Ramakant Nevatia,et al.  Efficient Inference with Multiple Heterogeneous Part Detectors for Human Pose Estimation , 2010, ECCV.

[33]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[34]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[35]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[36]  Pedro F. Felzenszwalb Object detection grammars , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[37]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.