3D Human Pose Estimation by an Annealed Two-Stage Inference Method

This paper proposes a novel human motion capture method that locates human body joint position and reconstructs the human pose in 3D space from monocular images. We propose a two-stage framework including 2D and 3D probabilistic graphical models which can solve the occlusion problem for the estimation of human joint positions. The 2D and 3D models adopt directed acyclic structure to avoid error propagation of inference in the models. Both the 2D and 3D models utilize the Expectation Maximization algorithm to learn prior distributions of the models. An annealed Gibbs sampling method is proposed for the two-stage method to inference the maximum posteriori distributions of joint positions. The annealing process can efficiently explore the mode of distributions and find solutions in high-dimensional space. Experiments are conducted on the Human Eva dataset to show the effectiveness of the proposed method. The experimental data are image sequences of walking motion with a full 180° turn around a region, which causes occlusion of poses and loss of image observations. Experimental results show that the proposed two-stage approach can efficiently estimate more accurate human poses from monocular images.

[1]  Michael Isard,et al.  Tracking loose-limbed people , 2004, CVPR 2004.

[2]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[3]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[4]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[5]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[6]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[7]  Gang Hua,et al.  Learning to estimate human pose with data driven belief propagation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[9]  Trevor Darrell,et al.  Bayesian Articulated Tracking Using Single Frame Pose Sampling , 2003 .

[10]  Nebojsa Jojic,et al.  Tracking articulated self - occluding objects in dense disparity maps , 1999 .

[11]  Mun Wai Lee,et al.  A model-based approach for estimating human 3D poses in static images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.