论文信息 - Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement - 字舞流文

Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement

Reinforcement learning (RL) is a powerful datadriven control method that has been largely explored in autonomous driving tasks. However, conventional RL approaches learn control policies through trial-and-error interactions with the environment and therefore may cause disastrous consequences such as collisions when testing in real traffic. Offline RL has recently emerged as a promising framework to learn effective policies from previously-collected, static datasets without the requirement of active interactions, making it especially appealing for autonomous driving applications. Despite promising, existing offline RL algorithms such as Batch-Constrained deep Qlearning (BCQ) generally lead to rather conservative policies with limited exploration efficiency. To address such issues, this paper presents an enhanced BCQ algorithm by employing a learnable parameter noise scheme in the perturbation model to increase the diversity of observed actions. In addition, a Lyapunov-based safety enhancement strategy is incorporated to constrain the explorable state space within a safe region. Experimental results in highway and parking traffic scenarios show that our approach outperforms the conventional RL method, as well as the state-ofthe-art offline RL algorithms.

Tianyu Shi | Zhaojian Li | Dong Chen | Kaian Chen | Dong Chen | Zhaojian Li | Kaian Chen | Tianyu Shi

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Sicun Gao,et al. Neural Lyapunov Control , 2020, NeurIPS.

[3] Ching-Yao Chan,et al. Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning , 2019, ArXiv.

[4] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[6] Jingda Wu,et al. Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving , 2021, ArXiv.

[7] Qiang Liu,et al. Learning to Explore via Meta-Policy Gradient , 2018, ICML.

[8] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[9] Extracting Traffic Smoothing Controllers Directly From Driving Data using Offline RL , 2020 .

[10] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.

[11] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[12] Yue Wang,et al. Autonomous Driving using Safe Reinforcement Learning by Incorporating a Regret-based Human Lane-Changing Decision Model , 2019, 2020 American Control Conference (ACC).

[13] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[14] M. Tomizuka,et al. Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[15] Ching-Yao Chan,et al. Automated Driving Maneuvers under Interactive Environment based on Deep Reinforcement Learning , 2018, 1803.09200.

[16] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .

[17] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[18] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[20] Brigitte d'Andréa-Novel,et al. The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] J. Zico Kolter,et al. Learning Stable Deep Dynamics Models , 2020, NeurIPS.

[23] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.