Exploration Strategy Improved DDPG for Lane Keeping Tasks in Autonomous Driving

We propose an Exploration Strategy Improved Deep Deterministic Policy Gradient algorithm called ESI-DDPG for lane keeping tasks in autonomous driving. The actor network in DDPG outputs a policy which is deterministic, so it is necessary to add noise to actions so that the autonomous vehicle can fully explore the environment and learn the optimal policy. However, the initial weight of exploration noise is large, which makes the autonomous vehicle carry out a lot of invalid exploration in the early training stage. Therefore, we combine the Stanley method to make a weighted correction to the exploration noise, so that the exploration of the vehicle tends to be in the right direction and the training efficiency can be improved. In addition, due to the good sample data obtained in the training process, the driving policy finally learned is better. We choose TORCS as the experiment platform and the results show that, compared with DDPG, TD3 and SAC, our proposed algorithm can learn the driving policy faster and the final policy has smaller trajectory error while driving.

[1]  Haochen Liu,et al.  Improved Deep Reinforcement Learning with Expert Demonstrations for Urban Autonomous Driving , 2021, 2022 IEEE Intelligent Vehicles Symposium (IV).

[2]  Tambet Matiisen,et al.  A Survey of End-to-End Driving: Architectures and Training Methods , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Yi Yang,et al.  Smooth Actor-Critic Algorithm for End-to-End Autonomous Driving , 2020, 2020 American Control Conference (ACC).

[4]  Alexander Carballo,et al.  A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.

[5]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[6]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Sen Wang,et al.  Deep Reinforcement Learning for Autonomous Driving , 2018, ArXiv.

[8]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[9]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[12]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[13]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..