Dilated Nearest-Neighbor Encoding for 3D Semantic Segmentation of Point Clouds

Three dimensional (3D) semantic segmentation is important in many scenarios, such as automatic driving, robotic navigation, etc. Random point sampling proves to be computation and memory efficient to tackle large-scale point clouds in semantic segmentation. However, information of small objects or the edge of objects may be lost. Instead of down-sampling point cloud directly, in this paper we propose a dilated nearest-neighbor encoding module to enlarge the receptive field to learn more 3D geometric information. To further reduce the layers of previous neural networks, we designed a multi-level hierarchical feature fusion network. We present one end-to-end 3D semantic segmentation framework based on the backbone of RandLA-Net and the two key components, dilated convolution and efficient feature fusion. Experiments on the benchmark 3D dataset prove that our framework performs better than other state-of-the-art approaches with fewer network layers.