Deep Surrogate Q-Learning for Autonomous Driving

Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving time drastically. We further propose an efficient implementation based on a permutation-equivariant deep neural network architecture of the Q-function to estimate action-values for a variable number of vehicles in sensor range. We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay and evaluate the performance of Surrogate Q-learning and Scene-centric Experience Replay in the open traffic simulator SUMO. Additionally, we show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.

[1]  Matthias Althoff,et al.  High-level Decision Making for Safe and Reasonable Autonomous Lane Changing using Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2]  Gabriel Kalweit,et al.  Deep Inverse Q-learning with Constraints , 2020, NeurIPS.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Gabriel Kalweit,et al.  Deep Constrained Q-learning , 2020 .

[5]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[6]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[7]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[8]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[9]  Lex Fridman,et al.  DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation , 2018 .

[10]  Florian Kuhnt,et al.  Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[11]  Gabriel Kalweit,et al.  Dynamic Interaction-Aware Scene Understanding for Reinforcement Learning in Autonomous Driving , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Kikuo Fujimura,et al.  Tactical Decision Making for Lane Changing with Deep Reinforcement Learning , 2017 .

[14]  Gabriel Kalweit,et al.  Dynamic Input for Deep Reinforcement Learning in Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Elmira Amirloo Abolfathi,et al.  Towards Practical Hierarchical Reinforcement Learning for Multi-lane Autonomous Driving , 2018 .

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Lutz Eckstein,et al.  The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[18]  K. Madhava Krishna,et al.  Overtaking Maneuvers in Simulated Highway Driving using Deep Reinforcement Learning , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[19]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[20]  Daniel Krajzewicz,et al.  Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Moritz Werling,et al.  Reinforcement Learning for Autonomous Maneuvering in Highway Scenarios , 2017 .

[23]  Alex Fridman,et al.  DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning , 2018, ArXiv.