Implicit Autoencoder for Point Cloud Self-supervised Representation Learning

This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface. This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge of the true 3D geometry. In the standard autoencoding paradigm, the encoder is compelled to encode not only the 3D geometry but also information on the specific discrete sampling of the 3D shape into the latent code. This is because the point cloud reconstructed by the decoder is considered unacceptable unless there is a perfect mapping between the original and the reconstructed point clouds. This paper introduces the Implicit AutoEncoder (IAE), a simple yet effective method that addresses the sampling variation issue by replacing the commonly-used point-cloud decoder with an implicit decoder. The implicit decoder reconstructs a continuous representation of the 3D shape, independent of the imperfections in the discrete samples. Extensive experiments demonstrate that the proposed IAE achieves state-of-the-art performance across various self-supervised learning benchmarks.

[1]  Jianan Li,et al.  CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds , 2022, NeurIPS.

[2]  Mohamed Elhoseiny,et al.  PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies , 2022, NeurIPS.

[3]  Hongsheng Li,et al.  Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training , 2022, NeurIPS.

[4]  Yong Jae Lee,et al.  Masked Discrimination for Self-Supervised Learning on Point Clouds , 2022, ECCV.

[5]  Francis E. H. Tay,et al.  Masked Autoencoders for Point Cloud Self-supervised Learning , 2022, ECCV.

[6]  D. Tao,et al.  Contrastive Boundary Learning for Point Cloud Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  R. Rodrigo,et al.  CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Y. Fu,et al.  Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework , 2022, ICLR.

[9]  D. Rukhovich,et al.  FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection , 2021, ECCV.

[10]  Jiwen Lu,et al.  Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bingbing Ni,et al.  Shape Self-Correction for Unsupervised Point Cloud Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Song-Chun Zhu,et al.  Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Jiwen Lu,et al.  RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Pengfei Wan,et al.  SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Jan Kautz,et al.  Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiang Bai,et al.  PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis , 2021, IEEE Transactions on Image Processing.

[18]  Dong Xu,et al.  Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zheng Zhang,et al.  Group-Free 3D Object Detection via Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  N. Barnes,et al.  Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Rohit Girdhar,et al.  Self-Supervised Pretraining of 3D Features on any Point-Cloud , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Klaus Dietmayer,et al.  Point Transformer , 2020, IEEE Access.

[23]  Matt J. Kusner,et al.  Unsupervised Point Cloud Pre-training via Occlusion Completion , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Yang Liu,et al.  Unsupervised 3D Learning for Shape Analysis via Multiresolution Instance Discrimination , 2020, AAAI.

[25]  Michael C. Frank,et al.  Unsupervised neural network models of the ventral visual stream , 2020, Proceedings of the National Academy of Sciences.

[26]  Gerard Pons-Moll,et al.  Neural Unsigned Distance Fields for Implicit Function Learning , 2020, NeurIPS.

[27]  Vladimir G. Kim,et al.  Self-Supervised Learning of Point Clouds via Orientation Estimation , 2020, 2020 International Conference on 3D Vision (3DV).

[28]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[29]  Haitao Yang,et al.  H3DNet: 3D Object Detection Using Hybrid Geometric Primitives , 2020, ECCV.

[30]  Jun Wang,et al.  MLCVNet: Multi-Level Context VoteNet for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andreas Geiger,et al.  Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bastian Leibe,et al.  3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[34]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[35]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  A. Markham,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Hao Li,et al.  Learning to Infer Implicit Surfaces without 3D Supervision , 2019, NeurIPS.

[41]  Anders P. Eriksson,et al.  Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Kaveh Hassani,et al.  Unsupervised Multi-Task Feature Learning on Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Matthias Zwicker,et al.  Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Andreas Geiger,et al.  Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Chengxu Zhuang,et al.  Local Aggregation for Unsupervised Learning of Visual Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Jonathan Sauder,et al.  Self-Supervised Deep Learning on Point Clouds by Reconstructing Space , 2019, NeurIPS.

[52]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[56]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[57]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[58]  Martial Hebert,et al.  PCN: Point Completion Network , 2018, 2018 International Conference on 3D Vision (3DV).

[59]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[60]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[63]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[65]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[66]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Leonidas J. Guibas,et al.  A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[70]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[71]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[74]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[75]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[76]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[82]  BengioSamy,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010 .

[83]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[84]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[85]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[86]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[87]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.