Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision

Learning dense point-wise semantics from unstructured 3D point clouds with fewer labels, although a realistic problem, has been under-explored in literature. While existing weakly supervised methods can effectively learn semantics with only a small fraction of point-level annotations, we find that the vanilla bounding box-level annotation is also informative for semantic segmentation of large-scale 3D point clouds. In this paper, we introduce a neural architecture, termed Box2Seg, to learn point-level semantics of 3D point clouds with bounding box-level supervision. The key to our approach is to generate accurate pseudo labels by exploring the geometric and topological structure inside and outside each bounding box. Specifically, an attention-based self-training (AST) technique and Point Class Activation Mapping (PCAM) are utilized to estimate pseudolabels. The network is further trained and refined with pseudo labels. Experiments on two large-scale benchmarks including S3DIS and ScanNet demonstrate the competitive performance of the proposed method. In particular, the proposed network can be trained with cheap, or even off-the-shelf bounding box-level annotations and subcloud-level tags.

[1]  Saining Xie,et al.  Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Ajmal Mian,et al.  Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds , 2020, IEEE transactions on pattern analysis and machine intelligence.

[4]  Alexander G. Schwing,et al.  3D Spatial Recognition without Spatially Labeled 3D , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Bichen Wu,et al.  SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation , 2020, ECCV.

[7]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[9]  Chi-Wing Fu,et al.  One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Tian Zheng,et al.  OccuSeg: Occupancy-Aware 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jiwen Lu,et al.  SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation , 2020, IEEE Transactions on Image Processing.

[15]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kurt Keutzer,et al.  SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Gim Hee Lee,et al.  Weakly Supervised Semantic Point Cloud Segmentation: Towards 10× Fewer Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[19]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[20]  Le Hui,et al.  SSPC-Net: Semi-supervised Semantic 3D Point Cloud Segmentation Network , 2021, AAAI.

[21]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[22]  Guosheng Lin,et al.  Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[24]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[25]  Ales Leonardis,et al.  SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds with 1000x Fewer Labels , 2021, ECCV.

[26]  Lin Gao,et al.  VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Mathias Schmitt,et al.  Human-machine-interaction in the industry 4.0 era , 2014, 2014 12th IEEE International Conference on Industrial Informatics (INDIN).

[28]  Alexandre Boulch,et al.  Unstructured Point Cloud Semantic Labeling Using Deep Segmentation Networks , 2017, 3DOR@Eurographics.

[29]  Bo Yang,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[31]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Marc Pollefeys,et al.  Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark , 2017, ArXiv.

[36]  Nassir Navab,et al.  Fully-Convolutional Point Networks for Large-Scale Point Clouds , 2018, ECCV.

[37]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[38]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Raquel Urtasun,et al.  Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds , 2018, 2018 International Conference on 3D Vision (3DV).

[40]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Ajmal Mian,et al.  SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yanyun Qu,et al.  Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Chi-Wing Fu,et al.  PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Mohammed Bennamoun,et al.  Deep Learning for 3D Point Clouds: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jiamao Li,et al.  3D Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation , 2018, ECCV.

[49]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tao Mei,et al.  Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud , 2021, AAAI.

[52]  Thomas Funkhouser,et al.  Virtual Multi-view Fusion for 3D Semantic Segmentation , 2020, ECCV.

[53]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[54]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Song Han,et al.  Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[56]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[57]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[58]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[59]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[60]  Andrew Markham,et al.  Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges , 2020, ArXiv.

[61]  Akshay Rangesh,et al.  3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).