SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion

Real-time scene reconstruction from depth data inevitably suffers from occlusion, thus leading to incomplete 3D models. Partial reconstructions, in turn, limit the performance of algorithms that leverage them for applications in the context of, e.g., augmented reality, robotic navigation, and 3D mapping. Most methods address this issue by predicting the missing geometry as an offline optimization, thus being incompatible with real-time applications. We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model. We evaluate the proposed approach quantitatively and qualitatively, demonstrating that our method can obtain accurate 3D semantic scene completion in real-time.

[1]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[2]  Marc Pollefeys,et al.  Learning Priors for Semantic 3D Reconstruction , 2018, ECCV.

[3]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[4]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[7]  Daniel Cohen-Or,et al.  Least-squares meshes , 2004, Proceedings Shape Modeling Applications, 2004..

[8]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[9]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[10]  Bernhard Schölkopf,et al.  Mask-Specific Inpainting with Deep Neural Networks , 2014, GCPR.

[11]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bo Yang,et al.  3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[13]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[14]  Sebastian Thrun,et al.  Shape from symmetry , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Nassir Navab,et al.  Adversarial Semantic Scene Completion from a Single Depth Image , 2018, 2018 International Conference on 3D Vision (3DV).

[17]  Marc Pollefeys,et al.  A Symmetry Prior for Convex Variational 3D Reconstruction , 2016, ECCV.

[18]  Tomoya Ishikawa,et al.  PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Marc Pollefeys,et al.  Segment based 3D object shape priors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stefan Leutenegger,et al.  Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping , 2018, IEEE Robotics and Automation Letters.

[21]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Torsten Sattler,et al.  3D Modeling on the Go: Interactive 3D Reconstruction of Large-Scale Scenes on Mobile Devices , 2015, 2015 International Conference on 3D Vision.

[23]  Matthias Nießner,et al.  RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[25]  Yu Liu,et al.  RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Matthias Nießner,et al.  Scan2CAD: Learning CAD Model Alignment in RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[28]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[29]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Angela Dai,et al.  SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Zhao,et al.  A robust hole-filling algorithm for triangular mesh , 2007, 2007 10th IEEE International Conference on Computer-Aided Design and Computer Graphics.

[32]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[34]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[35]  Charles T. Loop,et al.  A Closed-Form Bayesian Fusion Equation Using Occupancy Probabilities , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[36]  Leonidas J. Guibas,et al.  Acquiring 3D indoor environments with variability and repetition , 2012, ACM Trans. Graph..

[37]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Federico Tombari,et al.  ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Marc Alexa,et al.  Laplacian mesh optimization , 2006, GRAPHITE '06.

[40]  Xin Tong,et al.  View-Volume Network for Semantic Scene Completion from a Single Depth Image , 2018, IJCAI.

[41]  Marc Pollefeys,et al.  Dense Semantic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[43]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[46]  Matthias Nießner,et al.  ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Federico Tombari,et al.  Real-time and scalable incremental segmentation on dense SLAM , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Hans P. Moravec,et al.  High resolution maps from wide angle sonar , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[49]  Shuji Oishi,et al.  VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Marc Pollefeys,et al.  Class Specific 3D Object Shape Priors Using Surface Normals , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..