Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.

[1]  Matthias Althoff,et al.  High-level Decision Making for Safe and Reasonable Autonomous Lane Changing using Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2]  Dimitar Filev,et al.  Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning , 2019, Robotics Auton. Syst..

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[5]  Gabriel Kalweit,et al.  Dynamic Interaction-Aware Scene Understanding for Reinforcement Learning in Autonomous Driving , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Julius Ziegler,et al.  Optimal trajectory generation for dynamic street scenarios in a Frenét Frame , 2010, 2010 IEEE International Conference on Robotics and Automation.

[7]  Andriy Mnih,et al.  Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[8]  Daniel Krajzewicz,et al.  Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[9]  Gabriel Kalweit,et al.  Dynamic Input for Deep Reinforcement Learning in Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[11]  Pin Wang,et al.  Quadratic Q-network for Learning Continuous Control for Autonomous Vehicles , 2019, ArXiv.

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Javier Alonso-Mora,et al.  Planning and Decision-Making for Autonomous Vehicles , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[14]  Ching-Yao Chan,et al.  A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[15]  Carl-Johan Hoel,et al.  Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[16]  Nazim Kemal Ure,et al.  Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[17]  Javier Alonso-Mora,et al.  Parallel autonomy in automated vehicles: Safe motion generation with minimal intervention , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Francesco Borrelli,et al.  MPC-Based Approach to Active Steering for Autonomous Vehicle Systems , 2005 .

[19]  Gabriel Kalweit,et al.  Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving , 2020, ArXiv.

[20]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[23]  Peter King,et al.  Odin: Team VictorTango's entry in the DARPA Urban Challenge , 2008, J. Field Robotics.

[24]  Francesco Borrelli,et al.  Predictive Active Steering Control for Autonomous Vehicle Systems , 2007, IEEE Transactions on Control Systems Technology.

[25]  Ameet Talwalkar,et al.  Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits , 2016, ArXiv.

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[27]  Sen Wang,et al.  Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[28]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[29]  Gabriel Kalweit,et al.  Deep Inverse Q-learning with Constraints , 2020, NeurIPS.

[30]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.