Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect

We propose a method for learning shape models enabling accurate articulated human pose estimation from a single image. Where previous work has typically employed simple geometric models of human limbs e.g. cylinders which lead to rectangular projections, we propose to learn a generative model of limb shape which can capture the wide variation in shape due to varying anatomy and pose. The model is learnt from silhouette, depth and 3D pose data provided by a Microsoft Xbox Kinect, such that no manual annotation is required. We employ the learnt model in a pictorial structure model framework and demonstrate improved pose estimation from single silhouettes compared to using conventional rectangular limb models.

[1]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[2]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[7]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[8]  Stephen J. McKenna,et al.  Human Pose Estimation Using Partial Configurations and Probabilistic Regions , 2007, International Journal of Computer Vision.

[9]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[10]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Larry S. Davis,et al.  An Interactive Approach to Pose-Assisted and Appearance-based Segmentation of Humans , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[14]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[15]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[17]  Hans-Peter Seidel,et al.  Learning skeletons for shape and pose , 2010, I3D '10.

[18]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Michael J. Black,et al.  Contour people: A parameterized model of 2D articulated human shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Michael J. Black,et al.  A 2D Human Body Model Dressed in Eigen Clothing , 2010, ECCV.

[21]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[22]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.