Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models

While recent pre-training tasks on 2D images have proven very successful for transfer learning, pre-training for 3D data remains challenging. In this work, we introduce a general method for 3D self-supervised representation learning that 1) remains agnostic to the underlying neural network architecture, and 2) specifically leverages the geometric nature of 3D point cloud data. The proposed task softly segments 3D points into a discrete number of geometric partitions. A self-supervised loss is formed under the interpretation that these soft partitions implicitly parameterize a latent Gaussian Mixture Model (GMM), and that this generative model establishes a data likelihood function. Our pretext task can therefore be viewed in terms of an encoder-decoder paradigm that squeezes learned representations through an implicitly defined parametric discrete generative model bottleneck. We show that any existing neural network architecture designed for supervised point cloud segmentation can be repurposed for the proposed unsupervised pretext task. By maximizing data likelihood with respect to the soft partitions formed by the unsupervised point-wise segmentation network, learned representations are encouraged to contain compositionally rich geometric information. In tests, we show that our method naturally induces semantic separation in feature space, resulting in state-of-the-art performance on downstream applications like model classification and semantic segmentation.

[1]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[2]  Jonathan Sauder,et al.  Self-Supervised Deep Learning on Point Clouds by Reconstructing Space , 2019, NeurIPS.

[3]  Jie Zhou,et al.  Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[5]  Vladlen Koltun,et al.  Fully Convolutional Geometric Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jan Kautz,et al.  DeepGMR: Learning Latent Gaussian Mixture Models for Registration , 2020, ECCV.

[7]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[14]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[18]  Joan Lasenby,et al.  Pre-Training by Completing Point Clouds , 2021, ArXiv.

[19]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[21]  Vladimir G. Kim,et al.  Self-Supervised Learning of Point Clouds via Orientation Estimation , 2020, 2020 International Conference on 3D Vision (3DV).

[22]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[23]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[24]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[27]  Alan F. Smeaton,et al.  Contrastive Representation Learning: A Framework and Review , 2020, IEEE Access.

[28]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[29]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[32]  Yue Wang,et al.  PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Qi Tian,et al.  AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[36]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[37]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[38]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[39]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[41]  Song Han,et al.  Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[42]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[45]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[46]  Subhransu Maji,et al.  Multiresolution Tree Networks for 3D Point Cloud Processing , 2018, ECCV.

[47]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[48]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[49]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[50]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[51]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[52]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[53]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.