Going Deeper: Autonomous Steering with Neural Memory Networks

Although autonomous driving is an area which has been extensively explored in computer vision, current deep learning based methods such as direct image to action mapping approaches are not able to generate accurate results, making their application questionable. This is largely due to the lack of capacity of the current state-of-the-art architectures to capture long term dependencies which can model different human preferences and their behaviour under different contexts. Our work explores a new paradigm in deep autonomous driving where the model incorporates both visual input as well as the steering wheel trajectory and attains a long term planning capacity via neural memory networks. Furthermore, this work investigates optimal feature fusion techniques to combine these multimodal information sources, without discarding the vital information that they offer. The effectiveness of the proposed architecture is illustrated using two publicly available datasets where in both cases the proposed model demonstrates human like behaviour under challenging situations including illumination variations, discontinuous shoulder lines, lane merges, and divided highways, outperforming the current state-of-the-art.

[1]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[2]  Zhen-Hua Ling,et al.  Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference , 2016, ArXiv.

[3]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[6]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[7]  Julius Ziegler,et al.  Sparse scene flow segmentation for moving object detection in urban environments , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[8]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[9]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Mohamed Aly,et al.  Real time detection of lane markers in urban streets , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[11]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[13]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[14]  Sridha Sridharan,et al.  Tree Memory Networks for Modelling Long-term Temporal Dependencies , 2017, Neurocomputing.

[15]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[16]  Maxime Bouton,et al.  End-to-end Driving Controls Predictions from Images , 2016 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Petar Penkov,et al.  Applying Techniques in Supervised Deep Learning to Steering Angle Prediction in Autonomous Vehicles , 2016 .

[19]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[20]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Tobi Delbrück,et al.  Delta Networks for Optimized Recurrent Network Computation , 2016, ICML.

[23]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Xinlei Chen,et al.  Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hong Yu,et al.  Neural Tree Indexers for Text Understanding , 2016, EACL.

[27]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.