Steadily Learn to Drive with Virtual Memory

Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for highdimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems. LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent’s experience. Various imagined latent trajectories are generated as virtual memory by the latent dynamic model. The policy is learned by propagating gradient through the learned latent model with the imagined latent trajectories and thus leads to high data efficiency. Furthermore, a double critic structure is designed to reduce the oscillation during the training process. The effectiveness of LVM is demonstrated by an image-input autonomous driving task, in which LVM outperforms the existing method in terms of data efficiency, learning stability, and control performance.

[1]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2]  Meiling Wang,et al.  Stabilization Approaches for Reinforcement Learning-Based End-to-End Autonomous Driving , 2020, IEEE Transactions on Vehicular Technology.

[3]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[4]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[5]  M. Tomizuka,et al.  Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Keshav Bimbraw,et al.  Autonomous cars: Past, present and future a review of the developments in the last century, the present scenario and the expected future of autonomous vehicle technology , 2015, 2015 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Miguel Torres-Torriti,et al.  Introductory Survey to Open-Source Mobile Robot Simulation Software , 2010, 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting.

[9]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[10]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[12]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[15]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[16]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[17]  Fawzi Nashashibi,et al.  End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Alexander Carballo,et al.  A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.

[19]  Rajesh Rajamani,et al.  Vehicle dynamics and control , 2005 .

[20]  Ming Tang,et al.  Hierarchical and Networked Vehicle Surveillance in ITS: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[21]  Rishi Bedi,et al.  Deep Reinforcement Learning for Simulated Autonomous Vehicle Control , 2016 .

[22]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[23]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[24]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[25]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[26]  Jingliang Duan,et al.  Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors. , 2021, IEEE transactions on neural networks and learning systems.

[27]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.