论文信息 - General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm that jointly creates a rigged actor model commonly used for animation – skeleton, volumetric shape, appearance, and optionally a body surface – and estimates the actor’s motion from multi-view video input only. The approach is rigorously designed to work on footage of general outdoor scenes recorded with very few cameras and without background subtraction. Our method uses a new image formation model with analytic visibility and analytically differentiable alignment energy. For reconstruction, 3D body shape is approximated as a Gaussian density field. For pose and shape estimation, we minimize a new edge-based alignment energy inspired by volume ray casting in an absorbing medium. We further propose a new statistical human body model that represents the body surface, volumetric Gaussian density, and variability in skeleton shape. Given any multi-view sequence, our method jointly optimizes the pose and shape parameters of this model fully automatically in a spatiotemporal way.

[1] Hans-Peter Seidel,et al. MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[2] CurlessBrian,et al. The space of human body shapes , 2003 .

[3] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4] Christian Theobalt,et al. On-set performance capture of multiple actors with a stereo camera , 2013, ACM Trans. Graph..

[5] Patrick Pérez,et al. Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Hans-Peter Seidel,et al. Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Michael J. Black,et al. Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8] Yu Guo,et al. Clothed and naked human shapes estimation from a single image , 2012, CVM.

[9] Hans-Peter Seidel,et al. Performance capture from sparse multi-view video , 2008, SIGGRAPH 2008.

[10] Edmond Boyer,et al. An efficient volumetric framework for shape tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Andrew Blake,et al. Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12] Roberto Cipolla,et al. Automatic 3D Object Segmentation in Multiple Views using Volumetric Graph-Cuts , 2007, BMVC.

[13] Hans-Peter Seidel,et al. Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[14] Zoran Popovic,et al. The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[15] Ligang Liu,et al. Parametric reshaping of human bodies in images , 2010, SIGGRAPH 2010.

[16] Hans-Peter Seidel,et al. Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[17] Luc Van Gool,et al. Markerless full body tracking by integrating multiple cues , 2005 .

[18] Richard Szeliski,et al. Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[20] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[21] Pascal Fua,et al. Implicit meshes for surface reconstruction , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Cristian Sminchisescu,et al. Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[23] Michael J. Black,et al. Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Mohan M. Trivedi,et al. Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments , 2012, IEEE Journal of Selected Topics in Signal Processing.

[25] Adrian Hilton,et al. Wide Baseline Multi-view Video Matting Using a Hybrid Markov Random Field , 2014, 2014 22nd International Conference on Pattern Recognition.

[26] Adrian Hilton,et al. Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration , 2014, 2014 2nd International Conference on 3D Vision.

[27] Yu Chen,et al. Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[28] Hans-Peter Seidel,et al. Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[29] Pushmeet Kohli,et al. PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[30] Hans-Peter Seidel,et al. A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[31] Michael Isard,et al. Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[32] Hans-Peter Seidel,et al. A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Cristian Sminchisescu,et al. Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[34] Nassir Navab,et al. 3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Gabriel Taubin,et al. Stereoscopic Cinema , 2015, Image and Geometry Processing for 3-D Cinematography.

[36] Reinhard Koch,et al. Human Model Fitting from Monocular Posture Images , 2005 .

[37] Bernt Schiele,et al. Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[38] Adrian Hilton,et al. Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39] Didier Stricker,et al. KinectAvatar: Fully Automatic Body Capture Using a Single Kinect , 2012, ACCV Workshops.

[40] Wei Sun,et al. Whole-body modelling of people from multiview images to populate virtual worlds , 2000, The Visual Computer.

[41] Michael J. Black,et al. Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42] Hans-Peter Seidel,et al. Automatic generation of personalized human avatars from multi-view video , 2005, VRST '05.

[43] Jonathan Tompson,et al. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Pascal Fua,et al. Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[46] John P. Lewis,et al. Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[47] Mohan M. Trivedi,et al. Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[48] Jean-Yves Guillemaut,et al. General Dynamic Scene Reconstruction from Multiple View Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49] Jean-Yves Guillemaut,et al. Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[50] Hans-Peter Seidel,et al. Performance Capture from Multi-View Video , 2010, Image and Geometry Processing for 3-D Cinematography.

[51] Christian Theobalt,et al. Full Body Performance Capture under Uncontrolled and Varying Illumination: A Shading-Based Approach , 2012, ECCV.

[52] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[53] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Michael J. Black,et al. Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[55] Lourdes Agapito,et al. Automated articulated structure and 3D shape recovery from point correspondences , 2011, 2011 International Conference on Computer Vision.

[56] Takeo Kanade,et al. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[57] Michael J. Black,et al. Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[58] Sebastian Thrun,et al. SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[59] Wojciech Matusik,et al. Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[60] Luca Ballan,et al. Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes , 2008 .

[61] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[62] Michael J. Black,et al. MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[63] Francisco J. Serón,et al. A survey on participating media rendering techniques , 2005, The Visual Computer.

[64] Ligang Liu,et al. Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[65] Andrew W. Fitzgibbon,et al. Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[66] Bernt Schiele,et al. Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[67] Hans-Peter Seidel,et al. Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68] Hans-Peter Seidel,et al. Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[69] Yaser Sheikh,et al. 3D Trajectory Reconstruction under Perspective Projection , 2015, International Journal of Computer Vision.

[70] Wei Sun,et al. Virtual people: capturing human models to populate virtual worlds , 1999, Proceedings Computer Animation 1999.

[71] Hans-Hellmut Nagel,et al. 3D pose estimation by fitting image gradients directly to polyhedral models , 1995, Proceedings of IEEE International Conference on Computer Vision.

[72] Ioannis A. Kakadiaris,et al. Three-Dimensional Human Body Model Acquisition from Multiple Views , 1998, International Journal of Computer Vision.

[73] Michael J. Black,et al. The Naked Truth: Estimating Body Shape Under Clothing , 2008, ECCV.

[74] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[75] H. Nagel,et al. Tracking of persons in monocular image sequences , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[76] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..