UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction Using Commercial RGBD Cameras

A high-quality 4D geometry and texture reconstruction for human activities usually requires multiview perceptions via highly structured multi-camera setup, where both the specifically designed cameras and the tedious pre-calibration restrict the popularity of professional multi-camera systems for daily applications. In this paper, we propose UnstructuredFusion, a practicable realtime markerless human performance capture method using unstructured commercial RGBD cameras. Along with the flexible hardware setup using simply three unstructured RGBD cameras without any careful pre-calibration, the challenge 4D reconstruction through multiple asynchronous videos is solved by proposing three novel technique contributions, i.e., online multi-camera calibration, skeleton warping based non-rigid tracking, and temporal blending based atlas texturing. The overall insights behind lie in the solid global constraints of human body and human motion which are modeled by the skeleton and the skeleton warping, respectively. Extensive experiments such as allocating three cameras flexibly in a handheld way demonstrate that the proposed UnstructuredFusion achieves high-quality 4D geometry and texture reconstruction without tiresome pre-calibration, liberating the cumbersome hardware and software restrictions in conventional structured multi-camera system, while eliminating the inherent occlusion issues of the single camera setup.

[1]  Gérard G. Medioni,et al.  Capturing Dynamic Textured Surfaces of Moving Targets , 2016, ECCV.

[2]  Jian Sun,et al.  Bundled camera paths for video stabilization , 2013, ACM Trans. Graph..

[3]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[4]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[5]  Woltring Hj,et al.  New possibilities for human motion studies by real-time light spot position measurement. , 1974 .

[6]  Qionghai Dai,et al.  SimulCap : Single-View Human Performance Capture With Cloth Simulation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[8]  Jiaolong Yang,et al.  Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[11]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[13]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Motion Capture , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Hans-Peter Seidel,et al.  Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jie Liao,et al.  Texture Mapping for 3D Reconstruction with RGB-D Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Amitabh Varshney,et al.  Montage4D: interactive seamless fusion of multiview video textures , 2018, I3D.

[17]  Michael M. Kazhdan,et al.  Gradient-domain processing within a texture atlas , 2018, ACM Trans. Graph..

[18]  Ravi Ramamoorthi,et al.  Patch-based optimization for image-based texture mapping , 2017, ACM Trans. Graph..

[19]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[20]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[21]  M. Pauly,et al.  Embedded deformation for shape manipulation , 2007, SIGGRAPH 2007.

[22]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[23]  Vladlen Koltun,et al.  Color map optimization for 3D reconstruction with consumer depth cameras , 2014, ACM Trans. Graph..

[24]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[25]  Lu Fang,et al.  FlashFusion: Real-time Globally Consistent Dense 3D Reconstruction using CPU Computing , 2018, Robotics: Science and Systems.

[26]  C. Karen Liu,et al.  Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture , 2014, ACM Trans. Graph..

[27]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[28]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Slobodan Ilic,et al.  SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Leonidas J. Guibas,et al.  3Dlite: towards commodity 3D scanning for content creation , 2017, ACM Trans. Graph..

[31]  Qionghai Dai,et al.  DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[33]  Tao Yu,et al.  Real-time geometry, albedo and motion reconstruction using a single RGBD camera , 2017, TOGS.

[34]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Yaser Sheikh,et al.  Motion capture from body-mounted cameras , 2011, ACM Trans. Graph..

[36]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[37]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Shahram Izadi,et al.  Motion2fusion , 2017, ACM Trans. Graph..

[39]  Marc Rioux,et al.  A texture-mapping approach for the compression of colored 3D triangulations , 1996, The Visual Computer.

[40]  Christian Theobalt,et al.  On-set performance capture of multiple actors with a stereo camera , 2013, ACM Trans. Graph..

[41]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tao Yu,et al.  BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Qionghai Dai,et al.  Robust Non-rigid Motion Tracking and Surface Reconstruction Using L0 Regularization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  H. Woltring,et al.  New possibilities for human motion studies by real-time light spot position measurement. , 1974, Biotelemetry.

[45]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[46]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[47]  Christian Theobalt,et al.  LiveCap , 2018, ACM Trans. Graph..

[48]  Tao Yu,et al.  DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Nassir Navab,et al.  Multiple-Activity Human Body Tracking in Unconstrained Environments , 2010, AMDO.

[50]  Takeo Igarashi,et al.  As-rigid-as-possible shape manipulation , 2005, SIGGRAPH '05.

[51]  Patricia Ladret,et al.  The blur effect: perception and estimation with a new no-reference perceptual blur metric , 2007, Electronic Imaging.

[52]  Michael Goesele,et al.  Let There Be Color! Large-Scale Texturing of 3D Reconstructions , 2014, ECCV.

[53]  Lu Fang,et al.  Real-Time Global Registration for Globally Consistent RGB-D SLAM , 2019, IEEE Transactions on Robotics.

[54]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[55]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Qionghai Dai,et al.  FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras , 2016, IEEE Transactions on Visualization and Computer Graphics.

[57]  Tao Yu,et al.  HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs , 2018, ECCV.

[58]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Pascal Fua,et al.  Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera , 2018, IEEE Transactions on Visualization and Computer Graphics.

[60]  Lizhen Wang,et al.  DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs , 2018, ECCV.