Deep Octree-based CNNs with Output-Guided Skip Connections for 3D Shape and Scene Completion

Acquiring complete and clean 3D shape and scene data is challenging due to geometric occlusion and insufficient views during 3D capturing. We present a simple yet effective deep learning approach for completing the input noisy and incomplete shapes or scenes. Our network is built upon the octree-based CNNs (O-CNN) with U-Net like structures, which enjoys high computational and memory efficiency and supports to construct a very deep network structure for 3D CNNs. A novel output-guided skip-connection is introduced to the network structure for better preserving the input geometry and learning geometry prior from data effectively. We show that with these simple adaptions — output-guided skip-connection and deeper O-CNN (up to 70 layers), our network achieves state-of-the-art results in 3D shape completion and semantic scene computation.

[1]  Ghassan Hamarneh,et al.  VASE: Volume‐Aware Surface Evolution for Surface Reconstruction from Incomplete Point Clouds , 2011, Comput. Graph. Forum.

[2]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[5]  Eitan Grinspun,et al.  Context-based coherent surface completion , 2014, ACM Trans. Graph..

[6]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[8]  Huchuan Lu,et al.  Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[11]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12]  Gabriel Taubin,et al.  SSD: Smooth Signed Distance Surface Reconstruction , 2011, Comput. Graph. Forum.

[13]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Matthias Nießner,et al.  ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[17]  Kun Zhou,et al.  Data-Parallel Octrees for Surface Reconstruction , 2011, IEEE Transactions on Visualization and Computer Graphics.

[18]  Bo Yang,et al.  Dense 3D Object Reconstruction from a Single Depth View , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[21]  Jiajun Wu,et al.  Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[22]  Shi-Min Hu,et al.  Structure recovery by part assembly , 2012, ACM Trans. Graph..

[23]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Xin Tong,et al.  View-Volume Network for Semantic Scene Completion from a Single Depth Image , 2018, IJCAI.

[25]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Federico Tombari,et al.  ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Hongen Liao,et al.  Efficient Semantic Scene Completion Network with Spatial Group Convolution , 2018, ECCV.

[29]  Yu Hu,et al.  See and Think: Disentangling Semantic Scene Completion , 2018, NeurIPS.

[30]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[31]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[32]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yang Liu,et al.  Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes , 2018 .

[35]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[36]  Andreas Geiger,et al.  Learning 3D Shape Completion from Laser Scan Data with Weak Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  J. Wang,et al.  Constructing 3D CSG Models from 3D Raw Point Clouds , 2018, Comput. Graph. Forum.

[38]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  J. Wilhelms,et al.  Octrees for faster isosurface generation , 1992, TOGS.

[40]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[41]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[42]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[44]  Pawel Rzazewski,et al.  Thrust and CUDA in Data Intensive Algorithms , 2012, ADBIS Workshops.

[45]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[46]  Song Wu,et al.  3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling , 2015 .

[47]  Leonidas J. Guibas,et al.  Data-driven structural priors for shape completion , 2015, ACM Trans. Graph..

[48]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).