SegContrast: 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination

Semantic scene interpretation is essential for autonomous systems to operate in complex scenarios. While deep learning-based methods excel at this task, they rely on vast amounts of labeled data that is tedious to generate and might not cover all relevant classes sufficiently. Self-supervised representation learning has the prospect of reducing the amount of required labeled data by learning descriptive representations from unlabeled data. In this letter, we address the problem of representation learning for 3D point cloud data in the context of autonomous driving. We propose a new contrastive learning approach that aims at learning the structural context of the scene. Our approach extracts class-agnostic segments over the point cloud and applies the contrastive loss over these segments to discriminate between similar and dissimilar structures. We apply our method on data recorded with a 3D LiDAR. We show that our method achieves competitive performance and can learn a more descriptive feature representation than other state-of-the-art self-supervised contrastive point cloud methods.

[1]  Alexey Nekrasov,et al.  Mix3D: Out-of-Context Data Augmentation for 3D Scenes , 2021, 2021 International Conference on 3D Vision (3DV).

[2]  Hanqing Lu,et al.  Improving Multiple Object Tracking with Single Object Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  S. Saripalli,et al.  LiDARNet: A Boundary-Aware Domain Adaptation Model for Point Cloud Semantic Segmentation , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Cyrill Stachniss,et al.  Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset , 2021, Int. J. Robotics Res..

[5]  A. Leonardis,et al.  SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds with 1000x Fewer Labels , 2021, ECCV.

[6]  Hassan Foroosh,et al.  Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Oriol Vinyals,et al.  Efficient Visual Pretraining with Contrastive Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Xiangyu Zhang,et al.  You Only Look One-level Feature , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[10]  Leonidas J. Guibas,et al.  Weakly Supervised Learning of Rigid 3D Scene Flow , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wouter Van Gansbeke,et al.  Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  L. Gool,et al.  Exploring Cross-Image Pixel Contrast for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Rohit Girdhar,et al.  Self-Supervised Pretraining of 3D Features on any Point-Cloud , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Saining Xie,et al.  Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Leonidas J. Guibas,et al.  3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alan Yuille,et al.  Robust Instance Segmentation through Reasoning about Multi-Object Occlusion , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Federico Tombari,et al.  Panoster: End-to-End Panoptic Segmentation of LiDAR Point Clouds , 2020, IEEE Robotics and Automation Letters.

[20]  C. Stachniss,et al.  Domain Transfer for Semantic Segmentation of LiDAR Data using Deep Neural Networks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  C. Stachniss,et al.  LiDAR Panoptic Segmentation for Autonomous Driving , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Song Han,et al.  Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[23]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[24]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[25]  Thomas Funkhouser,et al.  Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[27]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[28]  Gal Chechik,et al.  Self-Supervised Learning for Domain Adaptation on Point Clouds , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Eren Erdal Aksoy,et al.  SalsaNext: Fast Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving , 2020, ArXiv.

[30]  Biao Gao,et al.  SemanticPOSS: A Point Cloud Dataset with Large Quantity of Dynamic Instances , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[31]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[32]  Mohammed Bennamoun,et al.  Deep Learning for 3D Point Clouds: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Cyrill Stachniss,et al.  RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Cyrill Stachniss,et al.  Fast Instance and Semantic Segmentation Exploiting Local Connectivity, Metric Learning, and One-Shot Detection for Robotics , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[36]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Juergen Gall,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[42]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[45]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[46]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[51]  Armin B. Cremers,et al.  Laser-based segment classification using a mixture of bag-of-words , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[52]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Michael Himmelsbach,et al.  Fast segmentation of 3D point clouds for ground vehicles , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[54]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[56]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[57]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[58]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[59]  Valdir Grassi Junior,et al.  Robotics , 2014, Communications in Computer and Information Science.