SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds

In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet [1] which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross-entropy loss with Lovasz-Softmax loss [2]. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each point in the cloud. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset [3], which demonstrates that the proposed SalsaNext outperforms other state-of-the-art semantic segmentation networks and ranks first on the Semantic-KITTI leaderboard. We also release our source code this https URL.

[1]  Davide Scaramuzza,et al.  A General Framework for Uncertainty Estimation in Deep Learning , 2020, IEEE Robotics and Automation Letters.

[2]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[3]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[4]  Klaus C. J. Dietmayer,et al.  Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[5]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[6]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[7]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Peter Wonka,et al.  Point Cloud Instance Segmentation using Probabilistic Embeddings , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thomas Brox,et al.  Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow , 2018, ECCV.

[10]  Cyrill Stachniss,et al.  RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[12]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[13]  Neil Genzlinger A. and Q , 2006 .

[14]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mohammed Bennamoun,et al.  Deep Learning for 3D Point Clouds: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Sven Behnke,et al.  LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices , 2019, Robotics: Science and Systems.

[20]  Yuan Wang,et al.  PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud , 2018, ArXiv.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Michael Felsberg,et al.  Deep Projective 3D Semantic Segmentation , 2017, CAIP.

[25]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[26]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[27]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Bo Yang,et al.  RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Raquel Urtasun,et al.  Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds , 2018, 2018 International Conference on 3D Vision (3DV).

[32]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[33]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  S. Roth,et al.  Lightweight Probabilistic Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Kurt Keutzer,et al.  SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[36]  Felix Järemo Lawin,et al.  Deep Projective 3 D Semantic Segmentation , 2017 .

[37]  Jing Ye,et al.  RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving , 2018, IEEE Robotics and Automation Letters.

[38]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xiang Li,et al.  Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  W. Marsden I and J , 2012 .

[41]  Bichen Wu,et al.  SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation , 2020, ECCV.

[42]  Eren Erdal Aksoy,et al.  SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[43]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).