An efficient branch-and-bound algorithm for optimal human pose estimation

Human pose estimation in a static image is a challenging problem in computer vision in that body part configurations are often subject to severe deformations and occlusions. Moreover, efficient pose estimation is often a desirable requirement in many applications. The trade-off between accuracy and efficiency has been explored in a large number of approaches. On the one hand, models with simple representations (like tree or star models) can be efficiently applied in pose estimation problems. However, these models are often prone to body part misclassification errors. On the other hand, models with rich representations (i.e., loopy graphical models) are theoretically more robust, but their inference complexity may increase dramatically. In this work, we propose an efficient and exact inference algorithm based on branch-and-bound to solve the human pose estimation problem on loopy graphical models. We show that our method is empirically much faster (about 74 times) than the state-of-the-art exact inference algorithm [21]. By extending a state-of-the-art tree model [16] to a loopy graphical model, we show that the estimation accuracy improves for most of the body parts (especially lower arms) on popular datasets such as Buffy [7] and Stickmen [5] datasets. Finally, our method can be used to exactly solve most of the inference problems on Stretchable Models [18] (which contains a few hundreds of variables) in just a few minutes.

[1]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[2]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[5]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[7]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Sebastian Nowozin,et al.  Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm , 2011, AISTATS.

[9]  Silvio Savarese,et al.  Efficient and Exact MAP-MRF Inference using Branch and Bound , 2012, AISTATS.

[10]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rina Dechter,et al.  Best-First AND/OR Search for Graphical Models , 2007, AAAI.

[12]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[13]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Uzi Vishkin,et al.  Recursive Star-Tree Parallel Data Structure , 1993, SIAM J. Comput..

[15]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[16]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[17]  Christoph Schnörr,et al.  A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[18]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[19]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[20]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[21]  Ailsa H. Land,et al.  An Automatic Method of Solving Discrete Programming Problems , 1960 .

[22]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[23]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[24]  Hao Jiang,et al.  Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[29]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[30]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.