Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving

Designing reliable decision strategies for autonomous urban driving is challenging. Reinforcement learning (RL) has been used to automatically derive suitable behavior in uncertain environments, but it does not provide any guarantee on the performance of the resulting policy. We propose a generic approach to enforce probabilistic guarantees on an RL agent. An exploration strategy is derived prior to training that constrains the agent to choose among actions that satisfy a desired probabilistic specification expressed with linear temporal logic (LTL). Reducing the search space to policies satisfying the LTL formula helps training and simplifies reward design. This paper outlines a case study of an intersection scenario involving multiple traffic participants. The resulting policy outperforms a rule-based heuristic approach in terms of efficiency while exhibiting strong guarantees on safety.

[1]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  Mykel J. Kochenderfer,et al.  Utility Decomposition with Deep Corrections for Scalable Planning under Uncertainty , 2018, AAMAS.

[3]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[4]  Christel Baier,et al.  Principles of model checking , 2008 .

[5]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[6]  Sebastian Junges,et al.  Shielded Decision-Making in MDPs , 2018, ArXiv.

[7]  Nathan Fulton,et al.  Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  C. Belta,et al.  Control of Markov decision processes from PCTL specifications , 2011, Proceedings of the 2011 American Control Conference.

[10]  Kikuo Fujimura,et al.  Tactical Decision Making for Lane Changing with Deep Reinforcement Learning , 2017 .

[11]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[12]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[13]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[14]  Mykel J. Kochenderfer,et al.  Belief state planning for autonomously navigating urban intersections , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[15]  Jonathan P. How,et al.  Decision Making Under Uncertainty: Theory and Application , 2015 .

[16]  Mykel J. Kochenderfer,et al.  The value of inferring the internal state of traffic participants for autonomous freeway driving , 2017, 2017 American Control Conference (ACC).

[17]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[18]  Calin Belta,et al.  Motion planning and control from temporal logic specifications with probabilistic satisfaction guarantees , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Swarat Chaudhuri,et al.  Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives , 2018, AAMAS.