General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation – skeleton, volumetric shape, appearance, and optionally a body surface – and estimates the actor’s motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as a Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume ray casting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, and variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

[1]  Hans-Peter Seidel,et al.  MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[2]  CurlessBrian,et al.  The space of human body shapes , 2003 .

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Christian Theobalt,et al.  On-set performance capture of multiple actors with a stereo camera , 2013, ACM Trans. Graph..

[5]  Patrick Pérez,et al.  Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Yu Guo,et al.  Clothed and naked human shapes estimation from a single image , 2012, CVM.

[9]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, SIGGRAPH 2008.

[10]  Edmond Boyer,et al.  An efficient volumetric framework for shape tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Roberto Cipolla,et al.  Automatic 3D Object Segmentation in Multiple Views using Volumetric Graph-Cuts , 2007, BMVC.

[13]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[14]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[15]  Ligang Liu,et al.  Parametric reshaping of human bodies in images , 2010, SIGGRAPH 2010.

[16]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[17]  Luc Van Gool,et al.  Markerless full body tracking by integrating multiple cues , 2005 .

[18]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[20]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[21]  Pascal Fua,et al.  Implicit meshes for surface reconstruction , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Cristian Sminchisescu,et al.  Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[23]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mohan M. Trivedi,et al.  Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments , 2012, IEEE Journal of Selected Topics in Signal Processing.

[25]  Adrian Hilton,et al.  Wide Baseline Multi-view Video Matting Using a Hybrid Markov Random Field , 2014, 2014 22nd International Conference on Pattern Recognition.

[26]  Adrian Hilton,et al.  Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration , 2014, 2014 2nd International Conference on 3D Vision.

[27]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[28]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[29]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[30]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[31]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[32]  Hans-Peter Seidel,et al.  A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[34]  Nassir Navab,et al.  3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Gabriel Taubin,et al.  Stereoscopic Cinema , 2015, Image and Geometry Processing for 3-D Cinematography.

[36]  Reinhard Koch,et al.  Human Model Fitting from Monocular Posture Images , 2005 .

[37]  Bernt Schiele,et al.  Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[38]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Didier Stricker,et al.  KinectAvatar: Fully Automatic Body Capture Using a Single Kinect , 2012, ACCV Workshops.

[40]  Wei Sun,et al.  Whole-body modelling of people from multiview images to populate virtual worlds , 2000, The Visual Computer.

[41]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Hans-Peter Seidel,et al.  Automatic generation of personalized human avatars from multi-view video , 2005, VRST '05.

[43]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Pascal Fua,et al.  Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[46]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[47]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[48]  Jean-Yves Guillemaut,et al.  General Dynamic Scene Reconstruction from Multiple View Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[50]  Hans-Peter Seidel,et al.  Performance Capture from Multi-View Video , 2010, Image and Geometry Processing for 3-D Cinematography.

[51]  Christian Theobalt,et al.  Full Body Performance Capture under Uncontrolled and Varying Illumination: A Shading-Based Approach , 2012, ECCV.

[52]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[53]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[55]  Lourdes Agapito,et al.  Automated articulated structure and 3D shape recovery from point correspondences , 2011, 2011 International Conference on Computer Vision.

[56]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[57]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[58]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[59]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[60]  Luca Ballan,et al.  Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes , 2008 .

[61]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[63]  Francisco J. Serón,et al.  A survey on participating media rendering techniques , 2005, The Visual Computer.

[64]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[65]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[66]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[67]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Yaser Sheikh,et al.  3D Trajectory Reconstruction under Perspective Projection , 2015, International Journal of Computer Vision.

[70]  Wei Sun,et al.  Virtual people: capturing human models to populate virtual worlds , 1999, Proceedings Computer Animation 1999.

[71]  Hans-Hellmut Nagel,et al.  3D pose estimation by fitting image gradients directly to polyhedral models , 1995, Proceedings of IEEE International Conference on Computer Vision.

[72]  Ioannis A. Kakadiaris,et al.  Three-Dimensional Human Body Model Acquisition from Multiple Views , 1998, International Journal of Computer Vision.

[73]  Michael J. Black,et al.  The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[74]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  H. Nagel,et al.  Tracking of persons in monocular image sequences , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[76]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..