Fast deformable model-based human performance capture and FVV using consumer-grade RGB-D sensors

Abstract In this paper, a novel end-to-end system for the fast reconstruction of human actor performances into 3D mesh sequences is proposed, using the input from a small set of consumer-grade RGB-Depth sensors. The proposed framework, by offline pre-reconstructing and employing a deformable actor’s 3D model to constrain the on-line reconstruction process, implicitly tracks the human motion. Handling non-rigid deformation of the 3D surface and applying appropriate texture mapping, it finally produces a dynamic sequence of temporally-coherent textured meshes, enabling realistic Free Viewpoint Video (FVV). Given the noisy input from a small set of low-cost sensors, the focus is on the fast (“quick-post”), robust and fully-automatic performance reconstruction. Apart from integrating existing ideas into a complete end-to-end system, which is itself a challenging task, several novel technical advances contribute to the speed, robustness and fidelity of the system, including a layered approach for model-based pose tracking, the definition and use of sophisticated energy functions, parallelizable on the GPU, as well as a new texture mapping scheme. The experimental results on a large number of challenging sequences, and comparisons with model-based and model-free approaches, demonstrate the efficiency of the proposed approach.

[1]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[2]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[3]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[4]  Henry Fuchs,et al.  Real-time volumetric 3D capture of room-sized scenes for telepresence , 2012, 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[5]  Paolo Cignoni,et al.  Metro: Measuring Error on Simplified Surfaces , 1998, Comput. Graph. Forum.

[6]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[7]  Bodo Rosenhahn,et al.  Ball joints for Marker-less human Motion Capture , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[8]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[9]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[10]  O. Sorkine Differential Representations for Mesh Processing , 2006 .

[11]  David Coeurjolly,et al.  Optimal Separable Algorithms to Compute the Reverse Euclidean Distance Transformation and Discrete Medial Axis in Arbitrary Dimension , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[13]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[14]  Olga Sorkine-Hornung,et al.  On Linear Variational Surface Deformation Methods , 2008, IEEE Transactions on Visualization and Computer Graphics.

[15]  Hans-Peter Seidel,et al.  Clustered Stochastic Optimization for Object Recognition and Pose Estimation , 2007, DAGM-Symposium.

[16]  Petros Daras,et al.  An Integrated Platform for Live 3D Human Reconstruction and Motion Capturing , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Christian Theobalt,et al.  On-set performance capture of multiple actors with a stereo camera , 2013, ACM Trans. Graph..

[18]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Christian Rössl,et al.  Dense correspondence finding for parametrization-free animation reconstruction from video , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[21]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, SIGGRAPH 2008.

[22]  John Darby,et al.  Tracking human pose with multiple activity models , 2010, Pattern Recognit..

[23]  Hong Zhou,et al.  Accurate integration of multi-view range images using k-means clustering , 2008, Pattern Recognit..

[24]  Albert Dipanda,et al.  Towards a real-time 3D shape reconstruction using a structured light system , 2005, Pattern Recognit..

[25]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[27]  Aljoscha Smolic,et al.  3D video and free viewpoint video - From capture to display , 2011, Pattern Recognit..

[28]  Hans-Peter Seidel,et al.  Interacting and Annealing Particle Filters: Mathematics and a Recipe for Applications , 2007, Journal of Mathematical Imaging and Vision.

[29]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Xiaojun Wu,et al.  Real-time dynamic 3-D object shape reconstruction and high-fidelity texture mapping for 3-D video , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[32]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[34]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[35]  Horst Bischof,et al.  Simultaneous Shape and Pose Adaption of Articulated Models Using Linear Optimization , 2012, ECCV.

[36]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[37]  Yee-Hong Yang,et al.  Robust multi-view L2 triangulation via optimal inlier selection and 3D structure refinement , 2014, Pattern Recognit..

[38]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[39]  Ilya Baran,et al.  Automatic rigging and animation of 3D characters , 2007, SIGGRAPH 2007.

[40]  S. Shankar Sastry,et al.  A mathematical introduction to robotics manipulation , 1994 .

[41]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[42]  Titus B. Zaharia,et al.  FAMC: The MPEG-4 standard for Animated Mesh Compression , 2008, 2008 15th IEEE International Conference on Image Processing.

[43]  Juergen Gall,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Optimization and Filtering for Human Motion Capture A Multi-layer Framework , 2022 .

[44]  Jean Ponce,et al.  Carved Visual Hulls for Image-Based Modeling , 2006, International Journal of Computer Vision.

[45]  Michael M. Kazhdan,et al.  Reconstruction of solid models from oriented point sets , 2005, SGP '05.

[46]  Federico Tombari,et al.  Semantic parametric body shape estimation from noisy depth sequences , 2016, Robotics Auton. Syst..

[47]  Mark R. Stevens,et al.  Methods for Volumetric Reconstruction of Visual Scenes , 2004, International Journal of Computer Vision.

[48]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[49]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[50]  Petros Daras,et al.  Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras , 2013, IEEE Transactions on Multimedia.

[51]  Adrian Hilton,et al.  Visual Analysis of Humans - Looking at People , 2013 .

[52]  Seong-Whan Lee,et al.  Reconstruction of 3D human body pose from stereo image sequences based on top-down learning , 2007, Pattern Recognit..

[53]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[54]  Ruzena Bajcsy,et al.  High-Quality Visualization for Geographically Distributed 3-D Teleimmersive Applications , 2011, IEEE Transactions on Multimedia.

[55]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[56]  Petros Daras,et al.  Real-time, realistic full-body 3D reconstruction and texture mapping from multiple Kinects , 2013, IVMSP 2013.

[57]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[58]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[59]  S. Goldsack,et al.  IN REAL-TIME , 2008 .

[60]  Horst Bischof,et al.  Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[61]  Petros Daras,et al.  Toward Real-Time and Efficient Compression of Human Time-Varying Meshes , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[62]  Xu Zhao,et al.  Generative tracking of 3D human motion by hierarchical annealed genetic algorithm , 2008, Pattern Recognit..

[63]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[64]  Jan-Michael Frahm,et al.  Scanning and tracking dynamic objects with commodity depth cameras , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[65]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Prabhu Kaliamoorthi,et al.  Parametric annealing: A stochastic search method for human pose tracking , 2013, Pattern Recognit..

[67]  Christian Rössl,et al.  Eurographics Symposium on Point-based Graphics (2006) Template Deformation for Point Cloud Fitting , 2022 .

[68]  Gloria Haro Shape from Silhouette Consensus , 2012, Pattern Recognit..

[69]  Surya Prakash,et al.  A semi-supervised approach to space carving , 2010, Pattern Recognit..