MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense Top-View Understanding

At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost complete environment information. In this paper, we introduce MASS - a Multi-Attentional Semantic Segmentation model specifically built for dense top-view understanding of the driving scenes. Our framework operates on pillar- and occupancy features and comprises three attention-based building blocks: (1) a keypoint-driven graph attention, (2) an LSTM-based attention computed from a vector embedding of the spatial input, and (3) a pillar-based attention, resulting in a dense 360° segmentation mask. With extensive experiments on both, SemanticKITTI and nuScenes-LidarSeg, we quantitatively demonstrate the effectiveness of our model, outperforming the state of the art by 19.0% on SemanticKITTI and reaching 30.4% in mIoU on nuScenes-LidarSeg, where MASS is the first work addressing the dense segmentation task. Furthermore, our multi-attention model is shown to be very effective for 3D object detection validated on the KITTI-3D dataset, showcasing its high generalizability to other tasks related to 3D vision.

[1]  Jorge Cabral,et al.  Automotive LiDAR Technology: A Survey , 2021, IEEE Transactions on Intelligent Transportation Systems.

[2]  Yingfeng Cai,et al.  Robust Target Recognition and Tracking of Self-Driving Cars With Radar and Camera Information Fusion Under Severe Weather Conditions , 2021, IEEE Transactions on Intelligent Transportation Systems.

[3]  Runxin Niu,et al.  A Fast Point Cloud Ground Segmentation Approach Based on Coarse-To-Fine Markov Random Field , 2020, IEEE Transactions on Intelligent Transportation Systems.

[4]  Rainer Stiefelhagen,et al.  Omnisupervised Omnidirectional Semantic Segmentation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[5]  Cyrill Stachniss,et al.  Multi-Scale Interaction for Real-Time LiDAR Data Segmentation on an Embedded Platform , 2020, IEEE Robotics and Automation Letters.

[6]  Stewart Worrall,et al.  Camera-LIDAR Integration: Probabilistic Sensor Fusion for Semantic Mapping , 2020, IEEE Transactions on Intelligent Transportation Systems.

[7]  Huijing Zhao,et al.  Are We Hungry for 3D LiDAR Data for Semantic Segmentation? , 2020, ArXiv.

[8]  Yi Xiao,et al.  Multimodal End-to-End Autonomous Driving , 2019, IEEE Transactions on Intelligent Transportation Systems.

[9]  Christoph Stiller,et al.  PillarSegNet: Pillar-based Semantic Grid Map Estimation using Sparse LiDAR Data , 2021, 2021 IEEE Intelligent Vehicles Symposium (IV).

[10]  Rainer Stiefelhagen,et al.  Capturing Omni-Range Context for Omnidirectional Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bingbing Liu,et al.  (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ying Li,et al.  Multi-Scale Point-Wise Convolutional Neural Networks for 3D Object Segmentation From LiDAR Point Clouds in Large-Scale Environments , 2021, IEEE Transactions on Intelligent Transportation Systems.

[13]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  Rainer Stiefelhagen,et al.  ISSAFE: Improving Semantic Segmentation in Accidents by Fusing Event-based Data , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Chenglu Wen,et al.  Mapping and Semantic Modeling of Underground Parking Lots Using a Backpack LiDAR System , 2019, IEEE Transactions on Intelligent Transportation Systems.

[18]  Klaus C. J. Dietmayer,et al.  Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges , 2019, IEEE Transactions on Intelligent Transportation Systems.

[19]  Christian Laugier,et al.  GndNet: Fast Ground Plane Estimation and Point Cloud Segmentation for Autonomous Vehicles , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Kailun Yang,et al.  PASS: Panoramic Annular Semantic Segmentation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[21]  Wenbo Chen,et al.  SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation , 2020, 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[22]  Sascha Wirges,et al.  Exploiting Multi-Layer Grid Maps for Surround-View Semantic Segmentation of Sparse LiDAR Data , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[23]  Johann Marius Zöllner,et al.  Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[24]  Philip David,et al.  PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Eren Erdal Aksoy,et al.  SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds , 2020, ISVC.

[27]  Biao Gao,et al.  SemanticPOSS: A Point Cloud Dataset with Large Quantity of Dynamic Instances , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[28]  Qiang Li,et al.  Spatio-temporal fall event detection in complex scenes using attention guided LSTM , 2020, Pattern Recognit. Lett..

[29]  Xin Zhao,et al.  TANet: Robust 3D Object Detection from Point Clouds with Triple Attention , 2019, AAAI.

[30]  D. Ramanan,et al.  What You See is What You Get: Exploiting Visibility for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  A. Markham,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  E. Aksoy,et al.  SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving , 2019, 2020 IEEE Intelligent Vehicles Symposium (IV).

[33]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Huijing Zhao,et al.  Semantic Segmentation of 3D LiDAR Data in Dynamic Scene Using Semi-Supervised Learning , 2018, IEEE Transactions on Intelligent Transportation Systems.

[35]  Ming Yang,et al.  Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras , 2018, IEEE Transactions on Intelligent Transportation Systems.

[36]  Cyrill Stachniss,et al.  RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Kailun Yang,et al.  Bridging the Day and Night Domain Gap for Semantic Segmentation , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[39]  Lei Wang,et al.  Appendix for : Graph Attention Convolution for Point Cloud Semantic Segmentation , 2019 .

[40]  Tae-Hyoung Park,et al.  Segmentation of Vehicles and Roads by a Low-Channel Lidar , 2019, IEEE Transactions on Intelligent Transportation Systems.

[41]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kurt Keutzer,et al.  SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[44]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Chenyang Lu,et al.  Monocular Semantic Occupancy Grid Mapping With Convolutional Variational Encoder–Decoder Networks , 2018, IEEE Robotics and Automation Letters.

[46]  Christian Wolf,et al.  Semantic Grid Estimation with a Hybrid Bayesian and Deep Neural Network Approach , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[48]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Edmond Boyer,et al.  FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[52]  Lei Gao,et al.  Signal Processing: Image Communication , 2022 .

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[59]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[60]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..