Generative 2D and 3D Human Pose Estimation with Vote Distributions

We address the problem of 2D and 3D human pose estimation using monocular camera information only. Generative approaches usually consist of two computationally demanding steps. First, different configurations of a complex 3D body model are projected into the image plane. Second, the projected synthetic person images and images of real persons are compared on a feature basis, like silhouettes or edges. In order to lower the computational costs of generative models, we propose to use vote distributions for anatomical landmarks generated by an Implicit Shape Model for each landmark. These vote distributions represent the image evidence in a more compact form and make the use of a simple 3D stick-figure body model possible since projected 3D marker points of the stick-figure can be compared with vote locations directly with negligible computational costs, which allows to consider near to half a million of different 3D poses per second on standard hardware and further to consider a huge set of 3D pose and configuration hypotheses in each frame. The approach is evaluated on the new Utrecht Multi-Person Motion (UMPM) benchmark with the result of an average joint angle reconstruction error of 8.0°.

[1]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[3]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[6]  Remco C. Veltkamp,et al.  UMPM benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[7]  Michael J. Black,et al.  Gibbs likelihoods for Bayesian tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[12]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[13]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[14]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[17]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[18]  Mark Everingham,et al.  Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Roberto Cipolla,et al.  Real-time tracking of highly articulated structures in the presence of noisy measurements , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Michael Arens,et al.  Human pose estimation with implicit shape models , 2010, ARTEMIS '10.