DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors

Previous online 3D dense reconstruction methods often cost massive memory storage while achieving unsatisfactory surface quality mainly due to the usage of stagnant underlying geometry representation, such as TSDF (truncated signed distance functions) or surfels, without any knowledge of the scene priors. In this paper, we present DI-Fusion (Deep Implicit Fusion), based on a novel 3D representation, called Probabilistic Local Implicit Voxels (PLIVoxs), for online 3D reconstruction using a commodity RGB-D camera. Our PLIVox encodes scene priors considering both the local geometry and uncertainty parameterized by a deep neural network. With such deep priors, we demonstrate by extensive experiments that we are able to perform online implicit 3D reconstruction achieving state-of-the-art mapping quality and camera trajectory estimation accuracy, while taking much less storage compared with previous online 3D reconstruction approaches.

[1]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Matthias Nießner,et al.  State of the Art on 3D Reconstruction with RGB‐D Cameras , 2018, Comput. Graph. Forum.

[3]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[5]  Ming Zeng,et al.  Octree-based fusion for realtime 3D reconstruction , 2013, Graph. Model..

[6]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[7]  Pablo Ramon Soria,et al.  Geometric Priors for Gaussian Process Implicit Surfaces , 2017, IEEE Robotics and Automation Letters.

[8]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[10]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Fang-Lue Zhang,et al.  WallNet: Reconstructing General Room Layouts from RGB Images , 2020, Graph. Model..

[13]  Hongbin Zha,et al.  PSDF Fusion: Probabilistic Signed Distance Function for On-the-fly 3D Data Fusion and Scene Reconstruction , 2018, ECCV.

[14]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[15]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[16]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Siddhartha Chaudhuri,et al.  BAE-NET: Branched Autoencoder for Shape Co-Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Stefan Leutenegger,et al.  DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using Single-View Depth and Gradient Predictions , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[21]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[22]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Eddy Ilg,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[29]  Tai-Jiang Mu,et al.  ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Andreas Geiger,et al.  Learning 3D Shape Completion Under Weak Supervision , 2018, International Journal of Computer Vision.

[32]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[34]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[36]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[37]  Gordon Wetzstein,et al.  MetaSDF: Meta-learning Signed Distance Functions , 2020, NeurIPS.

[38]  Leif Kobbelt,et al.  Noise-Resilient Reconstruction of Panoramas and 3D Scenes Using Robot-Mounted Unsynchronized Commodity RGB-D Cameras , 2020, ACM Trans. Graph..

[39]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[40]  Timothy D. Barfoot,et al.  State Estimation for Robotics , 2017 .

[41]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[42]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[43]  Shi-Min Hu,et al.  ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[45]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[46]  Matthias Nießner,et al.  ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Olaf Kähler,et al.  InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure , 2017, ArXiv.

[48]  Andrew Davison,et al.  NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).

[49]  Daniel D. Lee,et al.  Online Continuous Mapping using Gaussian Process Implicit Surfaces , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[50]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[51]  Marc Pollefeys,et al.  RoutedFusion: Learning Real-Time Depth Map Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..