论文信息 - Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.

[1] Matthias Althoff,et al. High-level Decision Making for Safe and Reasonable Autonomous Lane Changing using Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2] Dimitar Filev,et al. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning , 2019, Robotics Auton. Syst..

[3] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[4] Sebastian Thrun,et al. Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[5] Gabriel Kalweit,et al. Dynamic Interaction-Aware Scene Understanding for Reinforcement Learning in Autonomous Driving , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[6] Julius Ziegler,et al. Optimal trajectory generation for dynamic street scenarios in a Frenét Frame , 2010, 2010 IEEE International Conference on Robotics and Automation.

[7] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[8] Daniel Krajzewicz,et al. Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[9] Gabriel Kalweit,et al. Dynamic Input for Deep Reinforcement Learning in Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] Helbing,et al. Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[11] Pin Wang,et al. Quadratic Q-network for Learning Continuous Control for Autonomous Vehicles , 2019, ArXiv.

[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13] Javier Alonso-Mora,et al. Planning and Decision-Making for Autonomous Vehicles , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[14] Ching-Yao Chan,et al. A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[15] Carl-Johan Hoel,et al. Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[16] Nazim Kemal Ure,et al. Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[17] Javier Alonso-Mora,et al. Parallel autonomy in automated vehicles: Safe motion generation with minimal intervention , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Francesco Borrelli,et al. MPC-Based Approach to Active Steering for Autonomous Vehicle Systems , 2005 .

[19] Gabriel Kalweit,et al. Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving , 2020, ArXiv.

[20] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[23] Peter King,et al. Odin: Team VictorTango's entry in the DARPA Urban Challenge , 2008, J. Field Robotics.

[24] Francesco Borrelli,et al. Predictive Active Steering Control for Autonomous Vehicle Systems , 2007, IEEE Transactions on Control Systems Technology.

[25] Ameet Talwalkar,et al. Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits , 2016, ArXiv.

[26] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[27] Sen Wang,et al. Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[28] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.

[29] Gabriel Kalweit,et al. Deep Inverse Q-learning with Constraints , 2020, NeurIPS.

[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.