论文信息 - Learning to Drive in a Day

Learning to Drive in a Day

We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

[1] R. Mazo. On the theory of brownian motion , 1973 .

[2] Takeo Kanade,et al. First Results in Robot Road-Following , 1985, IJCAI.

[3] Takeo Kanade,et al. Autonomous land vehicle project at CMU , 1986, CSC '86.

[4] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[5] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6] Dariu Gavrila,et al. The Issues , 2011 .

[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[11] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[12] Sebastian Thrun,et al. Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[13] Sebastian Thrun,et al. Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[14] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[15] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[19] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[20] Eong Jinkyu,et al. What is the Trolley Problem , 2015 .

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[24] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[25] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28] Paul Newman,et al. Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[30] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[31] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[33] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[38] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.

[39] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[41] Liam Paull,et al. Autonomous Vehicle Navigation in Rural Environments Without Detailed Prior Maps , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Carlos R. del-Blanco,et al. DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[44] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.