Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement

Reinforcement learning (RL) is a powerful datadriven control method that has been largely explored in autonomous driving tasks. However, conventional RL approaches learn control policies through trial-and-error interactions with the environment and therefore may cause disastrous consequences such as collisions when testing in real traffic. Offline RL has recently emerged as a promising framework to learn effective policies from previously-collected, static datasets without the requirement of active interactions, making it especially appealing for autonomous driving applications. Despite promising, existing offline RL algorithms such as Batch-Constrained deep Qlearning (BCQ) generally lead to rather conservative policies with limited exploration efficiency. To address such issues, this paper presents an enhanced BCQ algorithm by employing a learnable parameter noise scheme in the perturbation model to increase the diversity of observed actions. In addition, a Lyapunov-based safety enhancement strategy is incorporated to constrain the explorable state space within a safe region. Experimental results in highway and parking traffic scenarios show that our approach outperforms the conventional RL method, as well as the state-ofthe-art offline RL algorithms.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Sicun Gao,et al.  Neural Lyapunov Control , 2020, NeurIPS.

[3]  Ching-Yao Chan,et al.  Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning , 2019, ArXiv.

[4]  Yifan Wu,et al.  Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Jingda Wu,et al.  Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving , 2021, ArXiv.

[7]  Qiang Liu,et al.  Learning to Explore via Meta-Policy Gradient , 2018, ICML.

[8]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[9]  Extracting Traffic Smoothing Controllers Directly From Driving Data using Offline RL , 2020 .

[10]  Joelle Pineau,et al.  Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.

[11]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[12]  Yue Wang,et al.  Autonomous Driving using Safe Reinforcement Learning by Incorporating a Regret-based Human Lane-Changing Decision Model , 2019, 2020 American Control Conference (ACC).

[13]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[14]  M. Tomizuka,et al.  Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[15]  Ching-Yao Chan,et al.  Automated Driving Maneuvers under Interactive Environment based on Deep Reinforcement Learning , 2018, 1803.09200.

[16]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[17]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[20]  Brigitte d'Andréa-Novel,et al.  The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  J. Zico Kolter,et al.  Learning Stable Deep Dynamics Models , 2020, NeurIPS.

[23]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.