Fusion of LiDAR and Camera Images in End-to-end Deep Learning for Steering an Off-road Unmanned Ground Vehicle

We consider the task of learning the steering policy based on deep learning for an off-road autonomous vehicle. The goal is to train a system in an end-to-end fashion to make steering predictions from input images delivered by a single optical camera and a LiDAR sensor. To achieve this, we propose a neural network-based information fusion approach and study several architectures. In one study focusing on late fusion, we investigate a system comprising two convolutional networks and a fully-connected network. The convolutional nets are trained on camera images and LiDAR images, respectively, whereas the fully-connected net is trained on combined features from each of these networks. Our experimental results show that fusing image and LiDAR information yields more accurate steering predictions on our dataset, than considering each data source separately. In another study we consider several architectures performing early fusion that circumvent the expensive full concatenation at raw image level. Even though the proposed early fusion approaches performed better than unimodal systems, they were significantly inferior to the best system based on late fusion. Overall, through fusion of camera and LiDAR images in an off-road setting, the normalized RMSE errors can be brought down to a region comparable to that for on-road environments.

[1]  Lennart Svensson,et al.  LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks , 2018, Robotics Auton. Syst..

[2]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[4]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[6]  Cewu Lu,et al.  LiDAR-Video Driving Dataset: Learning Driving Policies Effectively , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nanning Zheng,et al.  Detecting Drivable Area for Self-driving Cars: An Unsupervised Approach , 2017, ArXiv.

[9]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Karl Zipser,et al.  Multi-Modal Multi-Task Deep Learning for Autonomous Driving , 2017, ArXiv.

[11]  AngryCalc GeForce GTX 1080 Ti , 2018 .

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Byron Boots,et al.  Agile Off-Road Autonomous Driving Using End-to-End Deep Imitation Learning , 2017, ArXiv.

[14]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Ilja Radusch,et al.  Early Fusion of Camera and Lidar for robust road detection based on U-Net FCN , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[16]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[17]  Yadong Mu,et al.  Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.

[18]  Peng Li,et al.  Optimized Gated Deep Learning Architectures for Sensor Fusion , 2018, ArXiv.

[19]  Ahmet M. Kondoz,et al.  Fusion of LiDAR and Camera Sensor Data for Environment Sensing in Driverless Vehicles , 2017, ArXiv.

[20]  Ning Xu,et al.  Learn to Combine Modalities in Multimodal Deep Learning , 2018, ArXiv.

[21]  Kurt Keutzer,et al.  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  S. Srihari Mixture Density Networks , 1994 .

[23]  Denis Fernando Wolf,et al.  Road terrain detection: Avoiding common obstacle detection assumptions using sensor fusion , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[24]  Ivan Petrović,et al.  Bayesian Sensor Fusion Methods for Dynamic Object Tracking—A Comparative Study , 2014 .

[25]  Anna Choromanska,et al.  Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Øyvind Kjeldstad Grimnes End-to-end steering angle prediction and object detection using convolutional neural networks , 2017 .

[27]  Paul J. Walmsley,et al.  Bayesian Approaches to Multi-Sensor Data Fusion , 1999 .

[28]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[29]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[30]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Frédéric Jurie,et al.  CentralNet: a Multilayer Approach for Multimodal Fusion , 2018, ECCV Workshops.

[32]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).