Early Fusion of Camera and Lidar for robust road detection based on U-Net FCN

Automated vehicles rely on the accurate and robust detection of the drivable area, often classified into free space, road area and lane information. Most current approaches use monocular or stereo cameras to detect these. However, LiDAR sensors are becoming more common and offer unique properties for road area detection such as precision and robustness to weather conditions. We therefore propose two approaches for a pixel-wise semantic binary segmentation of the road area based on a modified U-Net Fully Convolutional Network (FCN) architecture. The first approach UView-Cam employs a single camera image, whereas the second approach UGrid-Fused incorporates a early fusion of LiDAR and camera data into a multi-dimensional occupation grid representation as FCN input. The fusion of camera and LiDAR allows for efficient and robust leverage of individual sensor properties in a single FCN. For the training of UView-Cam, multiple publicly available datasets of street environments are used, while the UGrid-Fused is trained with the KITTI dataset. In the KITTI Road/Lane Detection benchmark, the proposed networks reach a MaxF score of 94.23% and 93.81% respectively. Both approaches achieve realtime performance with a detection rate of about 10 Hz.

[1]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[2]  Sebastian Ramos,et al.  Vision-Based Offline-Online Perception Paradigm for Autonomous Driving , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Lennart Svensson,et al.  Fast LIDAR-based road detection using fully convolutional neural networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[7]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Myungchan Roh,et al.  Drivable Road Detection with 3D Point Clouds Based on the MRF for Intelligent Vehicle , 2013, FSR.

[9]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[10]  Rahul Mohan,et al.  Deep Deconvolutional Networks for Scene Parsing , 2014, ArXiv.

[11]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[14]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Ömer Sahin Tas,et al.  Automated vehicle system architecture with performance assessment , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[16]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Kutila Matti,et al.  Automotive LIDAR sensor development scenarios for harsh weather conditions , 2016 .

[18]  Lennart Svensson,et al.  Fast LIDAR-based Road Detection Using Convolutional Neural Networks , 2017 .