Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection

We present a robust real-time LiDAR 3D object detector that leverages heteroscedastic aleatoric uncertainties to significantly improve its detection performance. A multi-loss function is designed to incorporate uncertainty estimations predicted by auxiliary output layers. Using our proposed method, the network ignores to train from noisy samples, and focuses more on informative ones. We validate our method on the KITTI object detection benchmark. Our method surpasses the baseline method which does not explicitly estimate uncertainties by up to nearly 9% in terms of Average Precision (AP). It also produces state-of-the-art results compared to other methods, while running with an inference time of only 72ms. In addition, we conduct extensive experiments to understand how aleatoric uncertainties behave. Extracting aleatoric uncertainties brings almost no additional computation cost during the deployment, making our method highly desirable for autonomous driving applications.

[1]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[2]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[3]  Jean-Philippe Thiran,et al.  Combining LiDAR space clustering and convolutional neural networks for pedestrian detection , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[4]  Joydeep Ghosh,et al.  Robust detection of non-motorized road users using deep learning on optical and LIDAR data , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[5]  Marcelo H. Ang,et al.  A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Marcelo H. Ang,et al.  Car detection for autonomous vehicle: LIDAR and vision fusion approach through deep learning framework , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[9]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Ingmar Posner,et al.  Voting for Voting in Online Point Cloud Object Detection , 2015, Robotics: Science and Systems.

[11]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[13]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[15]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[17]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[18]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Klaus C. J. Dietmayer,et al.  Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[23]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shinpei Kato,et al.  LMNet: Real-time Multiclass Object Detection on CPU Using 3D LiDAR , 2018, 2018 3rd Asia-Pacific Conference on Intelligent Robot Systems (ACIRS).

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Lennart Svensson,et al.  Fast LIDAR-based road detection using fully convolutional neural networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[29]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[30]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Thomas Brox,et al.  Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow , 2018, ECCV.

[34]  Niko Sünderhauf,et al.  Dropout Sampling for Robust Object Detection in Open-Set Conditions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Klaus C. J. Dietmayer,et al.  Optimal Sensor Data Fusion Architecture for Object Detection in Adverse Weather Conditions , 2018, 2018 21st International Conference on Information Fusion (FUSION).

[37]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Fernando García,et al.  BirdNet: A 3D Object Detection Framework from LiDAR Information , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[39]  Kazunori Ohno,et al.  Vehicle detection and localization on bird's eye view elevation images using convolutional neural network , 2017, 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR).

[40]  Zsolt Kira,et al.  Fusing LIDAR and images for pedestrian detection using convolutional neural networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Paulo Peixoto,et al.  DepthCN: Vehicle detection using 3D-LIDAR and ConvNet , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[42]  Jing Ye,et al.  RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving , 2018, IEEE Robotics and Automation Letters.

[43]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[44]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.