Semantic segmentation of 3D indoor LiDAR point clouds through feature pyramid architecture search

Abstract Semantic segmentation of 3D Light Detection and Ranging (LiDAR) indoor point clouds using deep learning has been an active topic in recent years. However, most deep neural networks on point clouds conduct multi-level feature fusion via a simple U-shape architecture, which lacks enough capacity on both classification and localization in the segmentation task. In this paper, we propose a Neural Architecture Search (NAS) method to search a Feature Pyramid Network (FPN) module for 3D indoor point cloud semantic segmentation. Specifically, we aim to automatically find an effective feature pyramid architecture as a feature fusion neck in a designed novel pyramidal search space covering all information communication paths for multi-level features. The searched FPN module, named SFPN, contains the most important connections among all the potential paths to fuse representations at different levels. Our proposed SFPN is generic and effective as well as capable to be added to existing segmentation networks to augment the segmentation performance. Extensive experiments on ScanNet and S3DIS show that consistent and remarkable gains of segmentation performance can be achieved by different classical networks combined with SFPN. Specially, PointNet++-SFPN achieves mIoU gains of 7.8% on ScanNet v2 and 4.7% on S3DIS, and PointConv-SFPN achieves 4.5% and 3.7% improvement respectively on the above datasets.

[1]  Gongjian Wen,et al.  Semantic Context Encoding for Accurate 3D Point Cloud Segmentation , 2021, IEEE Transactions on Multimedia.

[2]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[4]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[5]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[7]  Saining Xie,et al.  On Network Design Spaces for Visual Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Mun-Cheon Kang,et al.  Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[9]  Hang Xu,et al.  Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Bingbing Ni,et al.  Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Shuguang Cui,et al.  FPConv: Learning Local Flattening for Point Convolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kaizhu Huang,et al.  Pay Attention Selectively and Comprehensively: Pyramid Gating Network for Human Pose Estimation without Pre-training , 2020, ACM Multimedia.

[17]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Pengfei Xiong,et al.  Pyramid Attention Network for Semantic Segmentation , 2018, BMVC.

[19]  Ying Li,et al.  Multi-Scale Point-Wise Convolutional Neural Networks for 3D Object Segmentation From LiDAR Point Clouds in Large-Scale Environments , 2021, IEEE Transactions on Intelligent Transportation Systems.

[20]  Chi-Wing Fu,et al.  PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Shuguang Cui,et al.  PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jinhui Tang,et al.  Feature Pyramid Transformer , 2020, ECCV.

[23]  Chi-Wing Fu,et al.  Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Monika Sester,et al.  Semantics-guided reconstruction of indoor navigation elements from 3D colorized points , 2021 .

[25]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[27]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Tao Mei,et al.  Customizable Architecture Search for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[31]  Song Han,et al.  Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[32]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[34]  Cheng Wang,et al.  Adaptive Pyramid Context Fusion for Point Cloud Perception , 2020, IEEE Geoscience and Remote Sensing Letters.

[35]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Fabio Maria Carlucci,et al.  NAS evaluation is frustratingly hard , 2020, ICLR.

[38]  Kai Xu,et al.  Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[40]  Kaleem Siddiqi,et al.  Local Spectral Graph Convolution for Point Set Feature Learning , 2018, ECCV.

[41]  Alexandre Boulch,et al.  SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks , 2017, Comput. Graph..

[42]  Ulrich Neumann,et al.  Grid-GCN for Fast and Scalable Point Cloud Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Florent Lafarge,et al.  Pyramid scene parsing network in 3D: Improving semantic segmentation of point clouds with multi-scale contextual information , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[46]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[47]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.