Multimodal Deep Learning for Advanced Driving Systems

Multimodal deep learning is about learning features over multiple modalities. Impressive progress has been made in deep learning solutions that rely on a single sensor modality for advanced driving. However, these approaches are limited to cover certain functionalities. The potential of multimodal sensor fusion has been very little exploited, although research vehicles are commonly provided with various sensor types. How to combine their data to achieve a complex scene analysis and improve therefore robustness in driving is still an open question. While different surveys have been done for intelligent vehicles or deep learning, to date no survey on multimodal deep learning for advanced driving exists. This paper attempts to narrow this gap by providing the first review that analyzes existing literature and two indispensable elements: sensors and datasets. We also provide our insights on future challenges and work to be done.

[1]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[2]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Jaskaran Virdi Using Deep Learning to Predict Obstacle Trajectories for Collision Avoidance in Autonomous Vehicles , 2017 .

[6]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Liang Xiao,et al.  Hybrid conditional random field based camera-LIDAR fusion for road detection , 2017, Inf. Sci..

[8]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[9]  Karl Zipser,et al.  Multi-Modal Multi-Task Deep Learning for Autonomous Driving , 2017, ArXiv.

[10]  Wei Zhan,et al.  Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for 3D Object Detection , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[11]  Yu Liu,et al.  A review of semantic segmentation using deep neural networks , 2017, International Journal of Multimedia Information Retrieval.

[12]  François Goulette,et al.  Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification , 2017, Int. J. Robotics Res..

[13]  S. Ullman Against direct perception , 1980, Behavioral and Brain Sciences.

[14]  Anna Choromanska,et al.  Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[16]  Min Bai,et al.  TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  François Goulette,et al.  Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  A. Carullo,et al.  An ultrasonic sensor for distance measurement in automotive applications , 2001, IEEE Sensors Journal.

[22]  Markus Hahn,et al.  Potential of radar for static object classification using deep learning methods , 2016, 2016 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM).

[23]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[25]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Dong Xu,et al.  Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[27]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[29]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  In So Kweon,et al.  KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving , 2018, IEEE Transactions on Intelligent Transportation Systems.

[31]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yadong Mu,et al.  Learning End-to-End Autonomous Steering Model from Spatial and Temporal Visual Cues , 2017, VSCC '17.

[33]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[35]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Hiroaki Takada,et al.  Implementation and Evaluation of Local Dynamic Map in Safety Driving Systems , 2015 .

[37]  Eder Santana,et al.  Learning a Driving Simulator , 2016, ArXiv.

[38]  Wei Zhan,et al.  Fusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection , 2017, ArXiv.