Fusing LiDAR and Color Imagery for Object Detection using Convolutional Neural Networks

The goal of this paper is answer to this question: how much fusing LiDAR and color images can improve the performance of a convolutional neural network (CNN)-based detector? To this end, we trained state-of-the-art CNN-based detectors using different configurations of color images and their associated LiDAR data, in conjunction and independently. Moreover, we investigate the effect of sparse and dense LiDAR data on the detection accuracy. For this purpose, we estimate a dense depth image from spare LiDAR data using a recent self-supervised depth completion technique [1] that requires only sequences of color and sparse depth images, without the need for dense depth labels. Then, we compared two detectors when are trained on sparse or dense LiDAR data. The obtained results on the KITTI dataset show that fusing dense LiDAR and color images is an efficient solution for future object detectors.

[1]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Fahimeh Farahnakian,et al.  Deep Convolutional Neural Network-based Fusion of RGB and IR Images in Marine Environment , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[3]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jukka Heikkonen,et al.  Visible and Infrared Image Fusion Framework based on RetinaNet for Marine Environment , 2019, 2019 22th International Conference on Information Fusion (FUSION).

[5]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[7]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Bo Li,et al.  Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference , 2017, Pattern Recognit..

[11]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[12]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[17]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zsolt Kira,et al.  Fusing LIDAR and images for pedestrian detection using convolutional neural networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[24]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Luc Van Gool,et al.  Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[27]  Jukka Heikkonen,et al.  Object Detection Based on Multi-sensor Proposal Fusion in Maritime Environment , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Lennart Svensson,et al.  LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks , 2018, Robotics Auton. Syst..