A Systematic Comparison of Deep Learning Architectures in an Autonomous Vehicle

Self-driving technology is advancing rapidly, largely due to recent developments in deep learning algorithms. To date, however, there has been no systematic comparison of how different deep learning architectures perform at such tasks, or an attempt to determine a correlation between classification performance and performance in an actual vehicle. Here, we introduce the first controlled comparison of seven contemporary deep-learning architectures in an end-to-end autonomous driving task. We use a simple and affordable platform consisting of of an off-the-shelf, remotely operated vehicle, a GPU equipped computer and an indoor foam-rubber racetrack. We compare a fully-connected network, a 2-layer CNN, AlexNet, VGG-16, Inception-V3, ResNet-26, and LSTM and report the number of laps they are able to successfully complete without crashing while traversing an indoor racetrack under identical testing conditions. Based on these tests, AlexNet completed the most laps without crashing out of all networks, and ResNet-26 is the most 'efficient' architecture examined, with respect to the number of laps completed relative to the number of parameters. We also observe whether spatial, color, or temporal features - or some combination - are more important for such tasks. Finally, we show that validation loss/accuracy is not sufficiently indicative of the model's performance even when employed in a real vehicle with a simple task, emphasizing the need for greater accessibility to research platforms within the self-driving community.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  S. C. Kremer,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[3]  Tao Mei,et al.  Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.

[4]  Abhishek Sharma,et al.  Real-Time Traffic Light Signal Recognition System for a Self-driving Car , 2017, SIRS.

[5]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[6]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[7]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[8]  Vinton G. Cerf,et al.  A comprehensive self-driving car test , 2018, Commun. ACM.

[9]  Carryl L. Baldwin,et al.  Detecting and Quantifying Mind Wandering during Simulated Driving , 2017, Front. Hum. Neurosci..

[10]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[12]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[13]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Anne L. Martel,et al.  Determining tumor cellularity in digital slides using ResNet , 2018, Medical Imaging.

[20]  Xudong Jiang,et al.  Deep Coupled ResNet for Low-Resolution Face Recognition , 2018, IEEE Signal Processing Letters.

[21]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[22]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Geoffrey Zweig,et al.  The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  David Hyunchul Shim,et al.  Development of a self-driving car that can handle the adverse weather , 2017 .

[25]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[26]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[27]  Jiebo Luo,et al.  End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[28]  Torsten Sattler,et al.  Obstacle detection for self-driving cars using only monocular cameras and wheel odometry , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Twan van Laarhoven,et al.  L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  A. V. Olgac,et al.  Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks , 2011 .

[34]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[35]  Sebastian Ramos,et al.  Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling , 2016, 2017 IEEE Intelligent Vehicles Symposium (IV).

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[38]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[39]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[40]  B. L. Kalman,et al.  Why tanh: choosing a sigmoidal function , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[41]  Jin-Hee Lee,et al.  ResNet-Based Vehicle Classification and Localization in Traffic Surveillance Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Navdeep Jaitly,et al.  Chained Predictions Using Convolutional Neural Networks , 2016, ECCV.

[43]  Tom Gedeon,et al.  Use of Noise to Augment Training Data: A Neural Network Method of Mineral–Potential Mapping in Regions of Limited Known Deposit Examples , 2003 .

[44]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[46]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[49]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[50]  Yue Lu,et al.  Combination of ResNet and Center Loss Based Metric Learning for Handwritten Chinese Character Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[55]  Fabio Tozeto Ramos,et al.  Malicious Software Classification Using Transfer Learning of ResNet-50 Deep Neural Network , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[56]  Neha Sharma,et al.  An Analysis Of Convolutional Neural Networks For Image Classification , 2018 .

[57]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[58]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Alain L. Kornhauser,et al.  Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars , 2017, ArXiv.

[61]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[62]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[63]  Bärbel Mertsching,et al.  Admissible gap navigation: A new collision avoidance approach , 2018, Robotics Auton. Syst..

[64]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[65]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[66]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[67]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[68]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[69]  David Sussillo,et al.  Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.