Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching

Learning to generate 3D point clouds without 3D supervision is an important but challenging problem. Current solutions leverage various differentiable renderers to project the generated 3D point clouds onto a 2D image plane, and train deep neural networks using the per-pixel difference with 2D ground truth images. However, these solutions are still struggling to fully recover fine structures of 3D shapes, such as thin tubes or planes. To resolve this issue, we propose an unsupervised approach for 3D point cloud generation with fine structures. Specifically, we cast 3D point cloud learning as a 2D projection matching problem. Rather than using entire 2D silhouette images as a regular pixel supervision, we introduce structure adaptive sampling to randomly sample 2D points within the silhouettes as an irregular point supervision, which alleviates the consistency issue of sampling from different view angles. Our method pushes the neural network to generate a 3D point cloud whose 2D projections match the irregular point supervision from different view angles. Our 2D projection matching approach enables the neural network to learn more accurate structure information than using the perpixel difference, especially for fine and thin 3D structures. Our method can recover fine 3D structures from 2D silhouette images at different resolutions, and is robust to different sampling methods and point number in irregular point supervision. Our method outperforms others under widely used benchmarks. Our code, data and models are available at https://github.com/chenchao15/2D projection matching. *indicates the equal contribution. This work was supported by National Key R&D Program of China (2020YFF0304100), the National Natural Science Foundation of China (62072268), and in part by TsinghuaKuaishou Institute of Future Media Data, and NSF (award 1813583). The corresponding author is Yu-Shen Liu.

[1]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[2]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[3]  Francesc Moreno-Noguer,et al.  C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chi-Man Vong,et al.  Unsupervised Learning of 3-D Local Features From Raw Voxels Based on a Novel Permutation Voxelization Strategy , 2019, IEEE Transactions on Cybernetics.

[5]  Charles T. Loop,et al.  Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Matthias Zwicker,et al.  SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Victor Lempitsky,et al.  TRANSPR: Transparency Ray-Accumulating Neural 3D Scene Point Renderer , 2020, 2020 International Conference on 3D Vision (3DV).

[8]  Yu-Shen Liu,et al.  Point Cloud Completion by Skip-Attention Network With Hierarchical Folding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Leonidas J. Guibas,et al.  Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[10]  Jitendra Malik,et al.  Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[12]  Junwei Han,et al.  3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation , 2019, IEEE Transactions on Image Processing.

[13]  Jan Kautz,et al.  Meshlet Priors for 3D Mesh Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Matthias Zwicker,et al.  Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views , 2019, IJCAI.

[15]  M. Zollhöfer,et al.  Pulsar: Efficient Sphere-based Neural Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[17]  Matthias Zwicker,et al.  Reconstructing 3D Shapes From Multiple Sketches Using Direct Shape Optimization , 2020, IEEE Transactions on Image Processing.

[18]  Anders P. Eriksson,et al.  Deep Level Sets: Implicit Surface Representations for 3D Shape Inference , 2019, ArXiv.

[19]  Matthias Zwicker,et al.  Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention , 2021, IEEE Transactions on Image Processing.

[20]  Adrien Gaidon,et al.  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Varun Jampani,et al.  DIFFER: Moving Beyond 3D Reconstruction with Differentiable Feature Rendering , 2019, CVPR Workshops.

[22]  Matthias Zwicker,et al.  DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images , 2020, ICML.

[23]  Matthias Zwicker,et al.  Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences , 2018, AAAI.

[24]  Matthias Zwicker,et al.  3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention , 2019, IJCAI.

[25]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[27]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[28]  Xuelong Li,et al.  Unsupervised 3D Local Feature Learning by Circle Convolutional Restricted Boltzmann Machine , 2016, IEEE Transactions on Image Processing.

[29]  Hongdong Li,et al.  Deep Novel View Synthesis from Colored 3D Point Clouds , 2020, ECCV.

[30]  M. Zollhöfer,et al.  PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations , 2020, ECCV.

[31]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[34]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Xinhai Liu,et al.  Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules , 2019, IEEE Transactions on Image Processing.

[36]  Pascal Fua,et al.  Shape Reconstruction by Learning Differentiable Surface Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ricardo Martin-Brualla,et al.  Neural RGB-D Surface Reconstruction , 2021, ArXiv.

[38]  Marc Pollefeys,et al.  Shape As Points: A Differentiable Poisson Solver , 2021, NeurIPS.

[39]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[40]  Zhengxing Sun,et al.  DFR: Differentiable Function Rendering for Learning 3D Generation from Images , 2020, Comput. Graph. Forum.

[41]  Hao Li,et al.  Learning to Infer Implicit Surfaces without 3D Supervision , 2019, NeurIPS.

[42]  Victor Lempitsky,et al.  StylePeople: A Generative Model of Fullbody Human Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  R. Venkatesh Babu,et al.  CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision , 2018, AAAI.

[44]  Chun-Liang Li,et al.  Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[45]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[47]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Junwei Han,et al.  Mesh Convolutional Restricted Boltzmann Machines for Unsupervised Learning of Features With Structure Preservation on 3-D Meshes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Robert Bridson,et al.  Fast Poisson disk sampling in arbitrary dimensions , 2007, SIGGRAPH '07.

[53]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[56]  Matthias Zwicker,et al.  SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates , 2020, ECCV.

[57]  Subhransu Maji,et al.  Shape Reconstruction Using Differentiable Projections and Deep Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Olga Sorkine-Hornung,et al.  Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Radomír Mech,et al.  3DN: 3D Deformation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Matthias Zwicker,et al.  Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Pengfei Wan,et al.  SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Jitendra Malik,et al.  Learning Category-Specific Deformable 3D Models for Object Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Matthias Zwicker,et al.  Render4Completion: Synthesizing Multi-View Depth Maps for 3D Shape Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[67]  Matthias Zwicker,et al.  Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network , 2018, AAAI.

[68]  Olga Sorkine-Hornung,et al.  Differentiable surface splatting for point-based geometry processing , 2019, ACM Trans. Graph..

[69]  Gordon Wetzstein,et al.  Acorn , 2021, ACM Trans. Graph..

[70]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Andreas Geiger,et al.  Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[73]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[74]  Junwei Han,et al.  Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network With Coupled Softmax , 2018, IEEE Transactions on Image Processing.

[75]  Pengfei Wan,et al.  Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[77]  Andreas Geiger,et al.  UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[78]  Jitendra Malik,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[80]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[81]  Matthias Zwicker,et al.  ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences , 2019, ACM Multimedia.

[82]  Pengfei Wan,et al.  PMP-Net: Point Cloud Completion by Learning Multi-step Point Moving Paths , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Matthias Zwicker,et al.  L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention , 2019, ACM Multimedia.

[84]  Alec Jacobson,et al.  Paparazzi , 2018, ACM Trans. Graph..

[85]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[86]  Olga Sorkine-Hornung,et al.  Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields , 2021, ArXiv.

[87]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[88]  Matthias Zwicker,et al.  Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces , 2021, ICML.

[89]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[90]  Matthias Zwicker,et al.  3D Shape Completion with Multi-view Consistent Inference , 2019, AAAI.

[91]  Gordon Wetzstein,et al.  MetaSDF: Meta-learning Signed Distance Functions , 2020, NeurIPS.

[92]  Vincent Sitzmann,et al.  Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering , 2021, NeurIPS.

[93]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.