论文信息 - End-to-End Learning of Driving Models from Large-Scale Video Datasets

End-to-End Learning of Driving Models from Large-Scale Video Datasets

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions.

[1] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[3] Vladimir Vapnik,et al. A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[4] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[5] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Christoph H. Lampert,et al. Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[8] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[10] Fernando A. Mujica,et al. An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[11] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[16] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[17] Sergey Levine,et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[18] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[19] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[20] Martial Hebert,et al. Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[21] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[22] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Eder Santana,et al. Learning a Driving Simulator , 2016, ArXiv.

[24] Trevor Darrell,et al. Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[26] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[28] Gabriel Kreiman,et al. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[29] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..