From Pixels to Actions: Learning to Drive a Car with Deep Neural Networks

The promise of self-driving cars promotes several advantages, e.g. they have the ability to outperform human drivers while being safer. Here we take a deeper look into some aspects from algorithms aimed at making this promise a reality. More specifically, we analyze an end-to-end neural network to predict a car's steering actions on a highway based on images taken from a single car-mounted camera. We focus our analysis on several aspects which could have a significant impact on the performance of the system. These aspects are: the input data format, the temporal dependencies between consecutive inputs, and the origin of the data. We show that, for the task at hand, regression networks outperform their classifier counterparts. In addition, there seems to be a small difference between networks that use coloured images and ones that use grayscale images as input. For the second aspect, by feeding the network three concatenated images, we get a significant decrease of 30% in mean squared error. For the third aspect, by using simulation data we are able to train networks that have a performance comparable to networks trained on real-life datasets. We also qualitatively demonstrate that the standard metrics that are used to evaluate networks do not necessarily accurately reflect a system's driving behaviour. We show that a promising confusion matrix may result in poor driving behaviour while a very ill-looking confusion matrix may result in good driving behaviour.

[1]  Tinne Tuytelaars,et al.  How hard is it to cross the room? - Training (Recurrent) Neural Networks to steer a UAV , 2017, ArXiv.

[2]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[3]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Bin Ran,et al.  Vision-Based Stop Sign Detection and Recognition System for Intelligent Vehicles , 2001 .

[6]  Arturo de la Escalera,et al.  Pedestrian Detection for Intelligent Vehicles Based on Active Contour Models and Stereo Vision , 2005, EUROCAST.

[7]  C. Hilario,et al.  Model based vehicle detection for intelligent vehicles , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[8]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[9]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[10]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[11]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Zehang Sun,et al.  On-road vehicle detection: a review , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Roland Siegwart,et al.  A data-driven approach for pedestrian intention estimation , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[18]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[19]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[20]  Guang Deng,et al.  Real-Time Vision-Based Stop Sign Detection System on FPGA , 2008, 2008 Digital Image Computing: Techniques and Applications.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Eder Santana,et al.  Learning a Driving Simulator , 2016, ArXiv.

[23]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[24]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[25]  Luc Van Gool,et al.  Fast Scene Understanding for Autonomous Driving , 2017, ArXiv.

[26]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[27]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).