Two-Stage Safe Reinforcement Learning for High-Speed Autonomous Racing

Decision making for autonomous driving is a safety-critical control problem. Prior works of safe reinforcement learning either tackle the problem with reward shaping or with modifying the reinforcement learning exploration process. However, the former cannot guarantee the safety during the learning process, while the latter relies heavily on expertise to design exquisite exploration policy. Currently, only short-term decision makings for low-speed driving were achieved in road scenes with basic geometries. In this paper, we propose a two-stage safe reinforcement learning algorithm to automatically learn a long-term policy for high-speed driving that guarantees safety during the entire training. In the first learning stage, model-free reinforcement learning is followed by a rule-based safeguard module to avoid danger at low speed without expert ne-tuning. In the second learning stage, the rule-based module is replaced with a data-driven counterpart to develop a closed-form analytical safety solution for high-speed driving. Moreover, an adaptive reward function is designed to match the different objectives of the two learning stages for faster convergence to an optimal policy. Experiments are conducted on a racing simulator TORCS which has complex racing tracks (e.g. sharp turns, hills). Compared with the state-of-the-art baselines, the results show that our method achieves zero safety violation and quickly converges to a more efficient and stable policy with an average speed of 127 km/h (3.3% higher than the best result of baselines) and an average swing of 3.96 degrees.

[1]  Francesco Borrelli,et al.  Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[2]  Masayoshi Tomizuka,et al.  Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[3]  Dimitar Filev,et al.  Autonomous Highway Driving using Deep Reinforcement Learning , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[4]  Ilya Kolmanovsky,et al.  Deep Reinforcement Learning with Enhanced Safety for Autonomous Highway Driving , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[5]  Krzysztof Czarnecki,et al.  Universally Safe Swerve Maneuvers for Autonomous Driving , 2020, IEEE Open Journal of Intelligent Transportation Systems.

[6]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[7]  Karthik Narasimhan,et al.  Projection-Based Constrained Policy Optimization , 2020, ICLR.

[8]  E. Altman Constrained Markov Decision Processes , 1999 .

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[11]  Huimin Ma,et al.  Survival-Oriented Reinforcement Learning Model: An Effcient and Robust Deep Reinforcement Learning Algorithm for Autonomous Driving Problem , 2017, ICIG.

[12]  Sen Wang,et al.  Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[13]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[14]  Fabien Moutarde,et al.  End to End Vehicle Lateral Control Using a Single Fisheye Camera , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Ufuk Topcu,et al.  Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.

[16]  Kikuo Fujimura,et al.  Tactical Decision Making for Lane Changing with Deep Reinforcement Learning , 2017 .

[17]  Mohammad Ghavamzadeh,et al.  Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[18]  Gregory D. Hager,et al.  Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Fawzi Nashashibi,et al.  End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[21]  Masayoshi Tomizuka,et al.  Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  David Isele,et al.  Safe Reinforcement Learning on Autonomous Vehicles , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Daniele Loiacono,et al.  Simulated Car Racing Championship: Competition Software Manual , 2013, ArXiv.

[24]  Emilio Frazzoli,et al.  What lies in the shadows? Safe and computation-aware motion planning for autonomous vehicles using intent-aware dynamic shadow regions , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Brigitte d'Andréa-Novel,et al.  The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[27]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[28]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[29]  Florian Kuhnt,et al.  Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[30]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.