论文信息 - Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision

Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision

Learning dense point-wise semantics from unstructured 3D point clouds with fewer labels, although a realistic problem, has been under-explored in literature. While existing weakly supervised methods can effectively learn semantics with only a small fraction of point-level annotations, we find that the vanilla bounding box-level annotation is also informative for semantic segmentation of large-scale 3D point clouds. In this paper, we introduce a neural architecture, termed Box2Seg, to learn point-level semantics of 3D point clouds with bounding box-level supervision. The key to our approach is to generate accurate pseudo labels by exploring the geometric and topological structure inside and outside each bounding box. Specifically, an attention-based self-training (AST) technique and Point Class Activation Mapping (PCAM) are utilized to estimate pseudolabels. The network is further trained and refined with pseudo labels. Experiments on two large-scale benchmarks including S3DIS and ScanNet demonstrate the competitive performance of the proposed method. In particular, the proposed network can be trained with cheap, or even off-the-shelf bounding box-level annotations and subcloud-level tags.

[1] Saining Xie,et al. Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Laurens van der Maaten,et al. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Ajmal Mian,et al. Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds , 2020, IEEE transactions on pattern analysis and machine intelligence.

[4] Alexander G. Schwing,et al. 3D Spatial Recognition without Spatially Labeled 3D , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Kurt Keutzer,et al. SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6] Bichen Wu,et al. SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation , 2020, ECCV.

[7] Fuxin Li,et al. PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Song Han,et al. Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[9] Chi-Wing Fu,et al. One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Leonidas J. Guibas,et al. KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Tian Zheng,et al. OccuSeg: Occupancy-Aware 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Subhransu Maji,et al. SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Jiwen Lu,et al. SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation , 2020, IEEE Transactions on Image Processing.

[15] Xinge Zhu,et al. Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Kurt Keutzer,et al. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17] Gim Hee Lee,et al. Weakly Supervised Semantic Point Cloud Segmentation: Towards 10× Fewer Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[19] Silvio Savarese,et al. SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[20] Le Hui,et al. SSPC-Net: Semi-supervised Semantic 3D Point Cloud Segmentation Network , 2021, AAAI.

[21] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[22] Guosheng Lin,et al. Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Leonidas J. Guibas,et al. PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding , 2020, ECCV.

[24] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[25] Ales Leonardis,et al. SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds with 1000x Fewer Labels , 2021, ECCV.

[26] Lin Gao,et al. VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Mathias Schmitt,et al. Human-machine-interaction in the industry 4.0 era , 2014, 2014 12th IEEE International Conference on Industrial Informatics (INDIN).

[28] Alexandre Boulch,et al. Unstructured Point Cloud Semantic Labeling Using Deep Segmentation Networks , 2017, 3DOR@Eurographics.

[29] Bo Yang,et al. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[31] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Silvio Savarese,et al. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Cyrill Stachniss,et al. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Marc Pollefeys,et al. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark , 2017, ArXiv.

[36] Nassir Navab,et al. Fully-Convolutional Point Networks for Large-Scale Point Clouds , 2018, ECCV.

[37] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[38] Martin Simonovsky,et al. Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Raquel Urtasun,et al. Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds , 2018, 2018 International Conference on 3D Vision (3DV).

[40] Raquel Urtasun,et al. Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Ajmal Mian,et al. SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Yanyun Qu,et al. Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Chi-Wing Fu,et al. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Mohammed Bennamoun,et al. Deep Learning for 3D Point Clouds: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Dragomir Anguelov,et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Jiamao Li,et al. 3D Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation , 2018, ECCV.

[49] Pascal Fua,et al. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Leonidas J. Guibas,et al. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Tao Mei,et al. Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud , 2021, AAAI.

[52] Thomas Funkhouser,et al. Virtual Multi-view Fusion for 3D Semantic Segmentation , 2020, ECCV.

[53] Silvio Savarese,et al. Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[54] Vladlen Koltun,et al. Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Song Han,et al. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[56] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[57] Vijay Kumar,et al. Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[58] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[59] Wei Wu,et al. PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[60] Andrew Markham,et al. Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges , 2020, ArXiv.

[61] Akshay Rangesh,et al. 3D BAT: A Semi-Automatic, Web-based 3D Annotation Toolbox for Full-Surround, Multi-Modal Data Streams , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).