Part Segmentation of Visual Hull for 3D Human Pose Estimation

In this paper we present an algorithm for estimating 3D pose of human targets using multiple, synchronized video streams obtained from a set of calibrated visual sensors. Our method uses 3D visual hull, reconstructed from multiview image silhouettes, to estimate skeleton and 3D pose of the human target. The key contribution of this work is to extend predictive human pose estimation algorithms used in the kinect gaming system to 3D visual hull data. In 3D space, viewpoint invariance is achieved by transforming the world reference frame to human centered reference frame. To do so, we first estimate the rigid body orientation and translation of the target from the shape of the visual hull. We then apply discriminative classifiers in the human centered reference frame to segment the 3D voxels of the visual hull into semantic part segments. The part clusters are then used to estimate a 3D pose that best aligns with the detected joint centers while conforming to the part non self-intersection constraints. Claims made in the work are supported by extensive experimental evaluation on both synthetic and real dataset.

[1]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[2]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[3]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[4]  Stefano Corazza,et al.  Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[7]  Dariu Gavrila,et al.  Multi-view 3D human pose estimation combining single-frame recovery, temporal integration and model adaptation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[9]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sebastian Thrun,et al.  Video-based reconstruction of animatable human characters , 2010, ACM Trans. Graph..

[11]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[12]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Michael J. Black,et al.  The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[14]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[15]  Bodo Rosenhahn,et al.  Multisensor-fusion for 3D full-body human motion capture , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[17]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[18]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[19]  Masamichi Shimosaka,et al.  3D voxel based online human pose estimation via robust and efficient hashing , 2009, 2009 IEEE International Conference on Robotics and Automation.

[20]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[21]  Luc Van Gool,et al.  2D Action Recognition Serves 3D Human Pose Estimation , 2010, ECCV.