DiffComplete: Diffusion-based Generative 3D Shape Completion

We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature aggregation mechanism to inject conditional features in a spatially-consistent manner. So, we can capture both local details and broader contexts of the conditional inputs to control the shape completion. Second, we propose an occupancy-aware fusion strategy in our model to enable the completion of multiple partial shapes and introduce higher flexibility on the input conditions. DiffComplete sets a new SOTA performance (e.g., 40% decrease on l_1 error) on two large-scale 3D shape completion benchmarks. Our completed shapes not only have a realistic outlook compared with the deterministic methods but also exhibit high similarity to the ground truths compared with the probabilistic alternatives. Further, DiffComplete has strong generalizability on objects of entirely unseen classes for both synthetic and real data, eliminating the need for model re-training in various applications.

[1]  Maneesh Agrawala,et al.  Adding Conditional Control to Text-to-Image Diffusion Models , 2023, ArXiv.

[2]  M. Nießner,et al.  3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models , 2023, ACM Trans. Graph..

[3]  Prafulla Dhariwal,et al.  Point-E: A System for Generating 3D Point Clouds from Complex Prompts , 2022, ArXiv.

[4]  A. Schwing,et al.  SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  M. Nießner,et al.  DiffRF: Rendering-Guided 3D Radiance Field Diffusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Paul Guerrero,et al.  3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models , 2022, ArXiv.

[7]  Felix Heide,et al.  DiffusionSDF: Conditional Generative Modeling of Signed Distance Functions , 2022, ArXiv.

[8]  K. Azizzadenesheli,et al.  Fast Sampling of Diffusion Models via Operator Learning , 2022, ArXiv.

[9]  S. Fidler,et al.  LION: Latent Point Diffusion Models for 3D Shape Generation , 2022, NeurIPS.

[10]  Walter A. Talbott,et al.  GAUDI: A Neural Architect for Immersive 3D Scene Generation , 2022, NeurIPS.

[11]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[12]  Peng-Shuai Wang,et al.  SDF‐StyleGAN: Implicit SDF‐Based StyleGAN for 3D Shape Generation , 2022, Comput. Graph. Forum.

[13]  Angela Dai,et al.  PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories , 2022, NeurIPS.

[14]  Shubham Tulsiani,et al.  AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  L. Gool,et al.  RePaint: Inpainting using Denoising Diffusion Probabilistic Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jiwen Lu,et al.  PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Stefano Ermon,et al.  D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation , 2021, NeurIPS.

[19]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[20]  Bo Dai,et al.  Unsupervised 3D Shape Completion through GAN Inversion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dan B. Goldman,et al.  Neural RGB-D Surface Reconstruction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiajun Wu,et al.  3D Shape Generation and Completion through Point-Voxel Diffusion , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Shitong Luo,et al.  Diffusion Probabilistic Models for 3D Point Cloud Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[25]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[26]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[27]  Bailin Deng,et al.  Fast and Robust Iterative Closest Point , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Justus Thies,et al.  SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[30]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[31]  Rundi Wu,et al.  Multimodal Shape Completion via Conditional Generative Adversarial Networks , 2020, ECCV.

[32]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[33]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zejian Yuan,et al.  A Multi-Scale Guided Cascade Hourglass Network for Depth Completion , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Johannes L. Schönberger,et al.  RoutedFusion: Learning Real-Time Depth Map Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  M. Nießner,et al.  SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bharath Hariharan,et al.  Few-Shot Generalization for Single-Image 3D Reconstruction via Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Niloy J. Mitra,et al.  Unpaired Point Cloud Completion on Real Scans using Adversarial Training , 2019, ICLR.

[39]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Matthias Nießner,et al.  Scan2Mesh: From Unstructured Range Scans to 3D Meshes , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Matthias Nießner,et al.  State of the Art on 3D Reconstruction with RGB‐D Cameras , 2018, Comput. Graph. Forum.

[42]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Matthias Nießner,et al.  Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  David Meger,et al.  Improved Adversarial Systems for 3D Object Generation and Reconstruction , 2017, CoRL.

[46]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[47]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Marc Pollefeys,et al.  A Symmetry Prior for Convex Variational 3D Reconstruction , 2016, ECCV.

[51]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[53]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[54]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[55]  Leonidas J. Guibas,et al.  Data-driven structural priors for shape completion , 2015, ACM Trans. Graph..

[56]  Matthias Nießner,et al.  Shading-based refinement on volumetric signed distance functions , 2015, ACM Trans. Graph..

[57]  Leonidas J. Guibas,et al.  Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction , 2015, Comput. Graph. Forum.

[58]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[59]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  Tobias Schreck,et al.  Approximate Symmetry Detection in Partial 3D Meshes , 2014, Comput. Graph. Forum.

[62]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[63]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[64]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[65]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[66]  Leonidas J. Guibas,et al.  Acquiring 3D indoor environments with variability and repetition , 2012, ACM Trans. Graph..

[67]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[68]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[69]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, ACM Trans. Graph..

[70]  Wei Zhao,et al.  A robust hole-filling algorithm for triangular mesh , 2007, 2007 10th IEEE International Conference on Computer-Aided Design and Computer Graphics.

[71]  Marc Alexa,et al.  Laplacian mesh optimization , 2006, GRAPHITE '06.

[72]  Leonidas J. Guibas,et al.  Partial and approximate symmetry detection for 3D geometry , 2006, ACM Trans. Graph..

[73]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[74]  Sebastian Thrun,et al.  Shape from symmetry , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[75]  Daniel Cohen-Or,et al.  Least-squares meshes , 2004, Proceedings Shape Modeling Applications, 2004..

[76]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[77]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[78]  Peng-Shuai Wang,et al.  O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis , 2017, ArXiv.

[79]  Duc Thanh Nguyen,et al.  A Field Model for Repairing 3D Shapes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  John Amanatides,et al.  A Fast Voxel Traversal Algorithm for Ray Tracing , 1987, Eurographics.