Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation

Part-based tree-structured models have been widely used for 2D articulated human pose-estimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body parts fit the same image region resulting in global pose estimates that poorly explain the overall image evidence. Attempts to solve this problem have focused on the use of strong prior models that are limited to learned activities such as walking. We argue that the problem actually lies with the image observations and not with the prior. In particular, image evidence for each body part is estimated independently of other parts without regard to self-occlusion. To address this we introduce occlusion-sensitive local likelihoods that approximate the global image likelihood using per-pixel hidden binary variables that encode the occlusion relationships between parts. This occlusion reasoning introduces interactions between non-adjacent body parts creating loops in the underlying graphical model. We deal with this using an extension of an approximate belief propagation algorithm (PAMPAS). The algorithm recovers the real-valued 2D pose of the body in the presence of occlusions, does not require strong priors over body pose and does a quantitatively better job of explaining image evidence than previous methods.

[1]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[2]  Michael J. Black,et al.  Cardboard people: A parametrized model of articulated motion , 1996 .

[3]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[5]  Larry S. Davis,et al.  Real-time periodic motion detection, analysis, and applications , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[7]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[8]  William T. Freeman,et al.  Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Ramakant Nevatia,et al.  Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[12]  Michael Isard,et al.  PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Mun Wai Lee,et al.  Proposal maps driven MCMC for estimating human body pose in static images , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[15]  Michael I. Mandel,et al.  Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation , 2004, NIPS.

[16]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Mun Wai Lee,et al.  Human Upper Body Pose Estimation in Static Images , 2004, ECCV.

[18]  Stephen J. McKenna,et al.  Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations , 2004, ECCV.

[19]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[20]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Gang Hua,et al.  Learning to estimate human pose with data driven belief propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Michael J. Black,et al.  Representing cyclic human motion using functional analysis , 2005, Image Vis. Comput..

[25]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.