Deep Reinforcement Learning for Autonomous Driving: A Survey

With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms, provides a taxonomy of automated driving tasks where (D)RL methods have been employed, highlights the key challenges algorithmically as well as in terms of deployment of real world autonomous driving agents, the role of simulators in training agents, and finally methods to evaluate, test and robustifying existing solutions in RL and imitation learning.

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[4]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[5]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[6]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  H. Fawcett Manual of Political Economy , 1995 .

[9]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[10]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[12]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[14]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[15]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[16]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[17]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[18]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[19]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[20]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[23]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[25]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[26]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[29]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[30]  Wassim G. Najm,et al.  Pre-Crash Scenario Typology for Crash Avoidance Research , 2007 .

[31]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[32]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[33]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[34]  Jonathan P. How,et al.  Real-Time Motion Planning With Applications to Autonomous Urban Driving , 2009, IEEE Transactions on Control Systems Technology.

[35]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[36]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[37]  N. H. C. Yung,et al.  A Multiple-Goal Reinforcement Learning Method for Complex Vehicle Overtaking Maneuvers , 2011, IEEE Transactions on Intelligent Transportation Systems.

[38]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[39]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[40]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[41]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[43]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[45]  Klaus Obermayer,et al.  Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations , 2015, KI - Künstliche Intelligenz.

[46]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[47]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[48]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[49]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[50]  Kagan Tumer,et al.  An Evolutionary Game Theoretic Analysis of Difference Evaluation Functions , 2015, GECCO.

[51]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[52]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[53]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[54]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[55]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[56]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[57]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[58]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[59]  Fang Zhang,et al.  Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving , 2016, ArXiv.

[60]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[61]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[62]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[63]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[64]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[65]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[66]  Etienne Perot,et al.  End-to-End Deep Reinforcement Learning for Lane Keeping Assist , 2016, ArXiv.

[67]  D. Cremers,et al.  Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks , 2016, ArXiv.

[68]  Simon Kardell,et al.  Autonomous vehicle control via deep reinforcement learning , 2017 .

[69]  David Isele,et al.  Transferring Autonomous Driving Knowledge on Simulated and Real Intersections , 2017, ArXiv.

[70]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.

[71]  Eric Wiewiora,et al.  Reward Shaping , 2017, Encyclopedia of Machine Learning and Data Mining.

[72]  Ching-Yao Chan,et al.  Formulation of deep reinforcement learning architecture toward autonomous driving for on-ramp merge , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[73]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[74]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[76]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[77]  Alexandre M. Bayen,et al.  Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control , 2017, ArXiv.

[78]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[80]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[81]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[82]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[83]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[84]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[85]  Jim Duggan,et al.  A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic Games , 2017, AAMAS.

[86]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[87]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[88]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[89]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[90]  Huimin Ma,et al.  Survival-Oriented Reinforcement Learning Model: An Effcient and Robust Deep Reinforcement Learning Algorithm for Autonomous Driving Problem , 2017, ICIG.

[91]  Sam Devlin,et al.  Policy invariance under reward transformations for multi-objective reinforcement learning , 2017, Neurocomputing.

[92]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[93]  Joseph Gonzalez,et al.  Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning , 2017, ArXiv.

[94]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[95]  Sen Wang,et al.  Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[96]  Martin Jägersand,et al.  MODNet: Motion and Appearance based Moving Object Detection Network for Autonomous Driving , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[97]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[98]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[99]  Zhang-Wei Hong,et al.  Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.

[100]  David Isele,et al.  Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[101]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[102]  Sergey Ten,et al.  Reinforcement Learning with A* and a Deep Heuristic , 2018, ArXiv.

[103]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[104]  Edouard Leurent,et al.  A Survey of State-Action Representations for Autonomous Driving , 2018 .

[105]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[106]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[107]  Yun-Pang Flötteröd,et al.  Microscopic Traffic Simulation using SUMO , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[108]  Senthil Yogamani,et al.  Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[109]  Ching-Yao Chan,et al.  A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[110]  Jiashi Feng,et al.  Policy Optimization with Demonstrations , 2018, ICML.

[111]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[112]  Wenshuo Wang,et al.  A Tempt to Unify Heterogeneous Driving Databases using Traffic Primitives , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[113]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[114]  Senthil Yogamani,et al.  Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[115]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[116]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[117]  Alex Bewley,et al.  Learning to Drive from Simulation without Real World Labels , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[118]  Krzysztof Czarnecki,et al.  Urban Driving with Multi-Objective Deep Reinforcement Learning , 2018, AAMAS.

[119]  David Filliat,et al.  Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[120]  Masayoshi Tomizuka,et al.  Model-free Deep Reinforcement Learning for Urban Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[121]  Stefan Milz,et al.  WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[122]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[123]  Ann Nowé,et al.  Multi-objective multi-agent decision making: a utility-based analysis and survey , 2019, Autonomous Agents and Multi-Agent Systems.

[124]  Pedro J. Navarro,et al.  A Systematic Review of Perception System and Simulators for Autonomous Vehicles Research , 2019, Sensors.

[125]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[126]  David Hurych,et al.  Yes, we GAN: Applying Adversarial Techniques for Autonomous Driving , 2019, Autonomous Vehicles and Machines.

[127]  Gregory Dudek,et al.  Generating Adversarial Driving Scenarios in High-Fidelity Simulators , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[128]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[129]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[130]  Senthil Yogamani,et al.  NeurAll: Towards a Unified Visual Perception Model for Automated Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[131]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[132]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[133]  Omar Nasr,et al.  RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[134]  Senthil Yogamani,et al.  SoilingNet: Soiling Detection on Automotive Surround-View Cameras , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[135]  Victor Talpaert,et al.  Exploring applications of deep reinforcement learning for real-world autonomous driving systems , 2019, VISIGRAPP.

[136]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.