3D Reconstruction from RGB-D Data

A key task in computer vision is that of generating virtual 3D models of real-world scenes by reconstructing the shape, appearance and, in the case of dynamic scenes, motion of the scene from visual sensors. Recently, low-cost video plus depth (RGB-D) sensors have become widely available and have been applied to 3D reconstruction of both static and dynamic scenes. RGB-D sensors contain an active depth sensor, which provides a stream of depth maps alongside standard colour video. The low cost and ease of use of RGB-D devices as well as their video rate capture of images along with depth make them well suited to 3D reconstruction. Use of active depth capture overcomes some of the limitations of passive monocular or multiple-view video-based approaches since reliable, metrically accurate estimates of the scene depth at each pixel can be obtained from a single view, even in scenes that lack distinctive texture. There are two key components to 3D reconstruction from RGB-D data: (1) spatial alignment of the surface over time and, (2) fusion of noisy, partial surface measurements into a more complete, consistent 3D model. In the case of static scenes, the sensor is typically moved around the scene and its pose is estimated over time. For dynamic scenes, there may be multiple rigid, articulated, or non-rigidly deforming surfaces to be tracked over time. The fusion component consists of integration of the aligned surface measurements, typically using an intermediate representation, such as the volumetric truncated signed distance field (TSDF). In this chapter, we discuss key recent approaches to 3D reconstruction from depth or RGB-D input, with an emphasis on real-time reconstruction of static scenes.

[1]  Tony DeRose,et al.  Surface reconstruction from unorganized points , 1992, SIGGRAPH.

[2]  Andrew W. Fitzgibbon Robust registration of 2D and 3D point sets , 2003, Image Vis. Comput..

[3]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Hongdong Li,et al.  Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[6]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[7]  Paul H. J. Kelly,et al.  Dense planar SLAM , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[8]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[9]  Martin D. Levine,et al.  Registering Multiview Range Data to Create 3D Computer Objects , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Gabriel Taubin,et al.  The ball-pivoting algorithm for surface reconstruction , 1999, IEEE Transactions on Visualization and Computer Graphics.

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Matthias Zwicker,et al.  Global registration of dynamic range scans for articulated model reconstruction , 2011, TOGS.

[13]  Adrian Hilton,et al.  Marching triangles: range image fusion for complex object modelling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[14]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Martin Klaudiny,et al.  Single-View RGBD-Based Reconstruction of Dynamic Human Geometry , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[16]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[20]  Tao Yu,et al.  Real-time geometry, albedo and motion reconstruction using a single RGBD camera , 2017, TOGS.

[21]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[22]  Charles Malleson,et al.  Hybrid Modeling of Non-Rigid Scenes From RGBD Cameras , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[24]  Peter-Pike J. Sloan,et al.  Interactive ray tracing for isosurface rendering , 1998 .

[25]  Matthias Nießner,et al.  Shading-based refinement on volumetric signed distance functions , 2015, ACM Trans. Graph..

[26]  Qiang Wu,et al.  Completed Dense Scene Flow in RGB-D Space , 2014, ACCV Workshops.

[27]  M. Goesele,et al.  Floating scale surface reconstruction , 2014, ACM Trans. Graph..

[28]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[30]  Adrian Hilton,et al.  Implicit Surface-Based Geometric Fusion , 1998, Comput. Vis. Image Underst..

[31]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Ming C. Lin,et al.  Example-guided physically based modal sound synthesis , 2013, ACM Trans. Graph..

[33]  Paul H. J. Kelly,et al.  Dense planar SLAM , 2014, ISMAR.

[34]  Martin Klaudiny,et al.  Structured Representation of Non-Rigid Surfaces from Single View 3D Point Tracks , 2014, 2014 2nd International Conference on 3D Vision.

[35]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[36]  Qingxiong Yang,et al.  Automatic Natural Video Matting with Depth , 2007 .

[37]  Peter Eisert,et al.  RECOVERING ARTICULATED POSE OF 3D POINT CLOUDS , 2011 .

[38]  Qionghai Dai,et al.  Robust Non-rigid Motion Tracking and Surface Reconstruction Using L0 Regularization , 2015, ICCV.

[39]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[40]  Daniel Cremers,et al.  A primal-dual framework for real-time dense RGB-D scene flow , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Sebastian Weik,et al.  Registration of 3-D partial surface models using luminance and depth information , 1997, Proceedings. International Conference on Recent Advances in 3-D Digital Imaging and Modeling (Cat. No.97TB100134).

[42]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[43]  Marsette Vona,et al.  Moving Volume KinectFusion , 2012, BMVC.

[44]  Hongdong Li,et al.  “Maximizing Rigidity” Revisited: A Convex Programming Approach for Generic 3D Shape Reconstruction from Multiple Perspective Views , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[46]  Martin Klaudiny,et al.  Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[47]  Horst Bischof,et al.  CP-Census: A Novel Model for Dense Variational Scene Flow from RGB-D Data , 2014, BMVC.

[48]  Hans-Peter Seidel,et al.  Animation cartography—intrinsic reconstruction of shape and motion , 2012, TOGS.

[49]  Martin Rutishauser,et al.  Merging range images of arbitrarily shaped objects , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Charles D. Malleson Dynamic scene modelling and representation from video and depth , 2016 .

[51]  Hans-Peter Seidel,et al.  A multi-scale approach to 3D scattered data interpolation with compactly supported basis functions , 2003, 2003 Shape Modeling International..

[52]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[53]  David Fofi,et al.  A review of recent range image registration methods with accuracy evaluation , 2007, Image Vis. Comput..

[54]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[55]  Konrad Schindler,et al.  A Generalisation of the ICP Algorithm for Articulated Bodies , 2008, BMVC.

[56]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[57]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[58]  Vladlen Koltun,et al.  Elastic Fragments for Dense Scene Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[61]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[62]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[63]  Vladlen Koltun,et al.  Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Slobodan Ilic,et al.  Free-form mesh tracking: A patch-based approach , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  Adrian Hilton,et al.  Global temporal registration of multiple non-rigid surface sequences , 2011, CVPR 2011.

[66]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[67]  Denis Laurendeau,et al.  A General Surface Approach to the Integration of a Set of Range Views , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[69]  Adrian Hilton,et al.  Hierarchical Shape Matching for Temporally Consistent 3D Video , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[70]  Adrian Hilton,et al.  Layered view-dependent texture maps , 2013, CVMP '13.

[71]  Vladlen Koltun,et al.  Dense scene reconstruction with points of interest , 2013, ACM Trans. Graph..

[72]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[73]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[75]  Michael Garland,et al.  Simplifying surfaces with color and texture using quadric error metrics , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[76]  Jongmoo Choi,et al.  Near laser-scan quality 3-D face reconstruction from a low-quality depth stream , 2015, Image Vis. Comput..

[77]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.