论文信息 - Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning

Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning

In this work, we propose a hierarchical reinforcement learning (HRL) structure which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning process; 3) reusing the well-trained network of sub-goals for other similar tasks with the same sub-goals. The states are defined as processed observations which are transmitted from the perception system of the autonomous vehicle. A hybrid reward mechanism is designed for different hierarchical layers in the proposed HRL structure. Compared to traditional RL methods, our algorithm is more sample-efficient since its modular design allows reusing the policies of sub-goals across similar tasks. The results show that the proposed method converges to an optimal policy faster than traditional RL methods.

[1] David Isele,et al. Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections , 2017, ArXiv.

[2] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[3] Joshué Pérez,et al. Fuzzy logic steering control of autonomous vehicles inside roundabouts , 2015, Appl. Soft Comput..

[4] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[10] John M. Dolan,et al. Automatically Generated Curriculum based Reinforcement Learning for Autonomous Vehicles in Urban Environment , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] David Isele,et al. Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[13] David N. Lee,et al. A Theory of Visual Control of Braking Based on Information about Time-to-Collision , 1976, Perception.

[14] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Anca D. Dragan,et al. Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[17] David Isele,et al. Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[19] John M. Dolan,et al. POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[20] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[21] Vicente Milanés Montero,et al. Controller for Urban Intersections Based on Wireless Communications and Fuzzy Logic , 2010, IEEE Transactions on Intelligent Transportation Systems.

[22] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[23] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[24] Rüdiger Dillmann,et al. Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[25] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[26] John M. Dolan,et al. Traffic interaction in the urban challenge: Putting boss on its best behavior , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Zhi-yi Huang,et al. Car-following theory of steady-state traffic flow using time-to-collision , 2011 .