PointGLR: Unsupervised Structural Representation Learning of 3D Point Clouds

This work explores the use of global and local structures of 3D point clouds as a free and powerful supervision signal for representation learning. Local and global patterns of a 3D object are closely related. Although each part of an object is incomplete, the underlying attributes about the object are shared among all parts, which makes reasoning about the whole object from a single part possible. We hypothesize that a powerful representation of a 3D object should model the attributes that are shared between parts and the whole object, and distinguishable from other objects. Based on this hypothesis, we propose a new framework to learn point cloud representations by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape. Moreover, we extend the unsupervised structural representation learning method to more complex 3D scenes. By introducing structural proxies as the intermediate-level representations between local and global ones, we propose a hierarchical reasoning scheme among local parts, structural proxies, and the overall point cloud to learn powerful 3D representations in an unsupervised manner. Extensive experimental results demonstrate that the unsupervised representations can be very competitive alternatives of supervised representations in discriminative power, and exhibit better performance in generalization ability and robustness. Our method establishes the new state-of-the-art of unsupervised/few-shot 3D object classification and part segmentation. We also show our method can serve as a simple yet effective regime for model pre-training on 3D scene segmentation and detection tasks. We expect our observations to offer a new perspective on learning better representations from data structures instead of human annotations for point cloud understanding.

[1]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[3]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[4]  Haitao Yang,et al.  H3DNet: 3D Object Detection Using Hybrid Geometric Primitives , 2020, ECCV.

[5]  Jie Zhou,et al.  Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[7]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[9]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[10]  Kaveh Hassani,et al.  Unsupervised Multi-Task Feature Learning on Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jiwen Lu,et al.  DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Matthias Zwicker,et al.  L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention , 2019, ACM Multimedia.

[14]  Matthias Zwicker,et al.  Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[16]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[26]  Yang Zhang,et al.  Point Cloud GAN , 2018, DGS@ICLR.

[27]  Enrico Magli,et al.  Learning Localized Generative Models for 3D Point Clouds via Graph Convolution , 2018, ICLR.

[28]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[29]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[30]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[32]  Slobodan Ilic,et al.  PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors , 2018, ECCV.

[33]  Subhransu Maji,et al.  Multiresolution Tree Networks for 3D Point Cloud Processing , 2018, ECCV.

[34]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[35]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[37]  Yaron Lipman,et al.  Point convolutional neural networks by extension operators , 2018, ACM Trans. Graph..

[38]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[40]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[44]  C. Lee Giles,et al.  Learning a Hierarchical Latent-Variable Model of 3D Shapes , 2017, 2018 International Conference on 3D Vision (3DV).

[45]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Anath Fischer,et al.  3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[48]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[50]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[51]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[56]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[57]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[58]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[59]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[60]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[63]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[67]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[69]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[70]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[72]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[74]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[75]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[76]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .