RGAM: A novel network architecture for 3D point cloud semantic segmentation in indoor scenes

Abstract Three-dimensional (3D) point cloud semantic segmentation is an essential part of computer vision for scene comprehension. Nevertheless, due to their loss of detail, existing networks lack the ability to recognize complex scenes. This paper proposes a novel network architecture, called the ring grouping neural network with attention module (RGAM), which presents four improvements over the existing networks. First, novel multi-scale ring grouping learning is designed to extract the multi-scale neighborhood features without overlapped sampling, allowing the network to adapt to objects of different scales. Second, neighborhood information fusion is defined as the weighted sum of multiple neighborhood features, enabling the representation of each point to be considered in different neighborhoods. Third, in the global view, a spatial attention module is introduced among the neighborhoods, allowing long-range contextual information to be exploited for 3D point cloud semantic segmentation. Finally, a channel attention module is appended to the RGAM: the correlation of each channel with key information enhances the complex scene recognition ability of the RGAM. Experimental results on the challenging S3DIS, ScanNet, and NYU-V2 datasets demonstrate that the RGAM has stronger recognition ability than the existing networks based on several state-of-the-art algorithms for 3D point cloud semantic segmentation.

[1]  Bo Guo,et al.  Discriminative-Dictionary-Learning-Based Multilevel Point-Cluster Features for ALS Point-Cloud Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Hanzhen Xiao,et al.  Time-varying Nonholonomic Robot Consensus Formation Using Model Predictive Based Protocol With Switching Topology , 2021, Inf. Sci..

[3]  Yan Zhou,et al.  3D shape classification and retrieval based on polar view , 2019, Inf. Sci..

[4]  Wenbing Tao,et al.  Efficient convex optimization-based texture mapping for large-scale 3D scene reconstruction , 2021, Inf. Sci..

[5]  Jun Zhou,et al.  A future intelligent traffic system with mixed autonomous vehicles and human-driven vehicles , 2020, Inf. Sci..

[6]  Te-Feng Su,et al.  Thinning algorithms based on quadtree and octree representations , 2006, Inf. Sci..

[7]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Lei Wang,et al.  MSNet: Multi-Scale Convolutional Network for Point Cloud Classification , 2018, Remote. Sens..

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[12]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[14]  Subhransu Maji,et al.  3D Shape Segmentation with Projective Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Feng Lu,et al.  VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[17]  James R. Lersch,et al.  Context-driven automated target detection in 3D data , 2004, SPIE Defense + Commercial Sensing.

[18]  Markus H. Gross,et al.  A Network Architecture for Point Cloud Classification via Automatic Depth Images Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  George Vosselman,et al.  Recognizing basic structures from mobile laser scanning data for road inventory studies , 2011 .

[20]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[23]  Chi-Wing Fu,et al.  PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[25]  Zhaoxia Guo,et al.  A turning point-based offline map matching algorithm for urban road networks , 2021, Inf. Sci..

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Liang Zhang,et al.  A Deep Neural Network With Spatial Pooling (DNNSP) for 3-D Point Cloud Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Markus Vincze,et al.  Enhancing Semantic Segmentation for Robotics: The Power of 3-D Entangled Forests , 2016, IEEE Robotics and Automation Letters.

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Jiahao Fan,et al.  A Novel Simplification Method for 3D Geometric Point Cloud Based on the Importance of Point , 2019, IEEE Access.

[34]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[36]  Cheng Wang,et al.  Fast neighbor search by using revised k-d tree , 2019, Inf. Sci..

[37]  Gang Wang,et al.  Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling , 2014, ECCV.

[38]  Li Li,et al.  Segmentation-based classification for 3D point clouds in the road environment , 2018 .

[39]  Naveed Akhtar,et al.  Octree Guided CNN With Spherical Kernels for 3D Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Zhendong Mao,et al.  Hierarchical multi-view context modelling for 3D object classification and retrieval , 2021, Inf. Sci..