Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning

In this work, we propose a hierarchical reinforcement learning (HRL) structure which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning process; 3) reusing the well-trained network of sub-goals for other similar tasks with the same sub-goals. The states are defined as processed observations which are transmitted from the perception system of the autonomous vehicle. A hybrid reward mechanism is designed for different hierarchical layers in the proposed HRL structure. Compared to traditional RL methods, our algorithm is more sample-efficient since its modular design allows reusing the policies of sub-goals across similar tasks. The results show that the proposed method converges to an optimal policy faster than traditional RL methods.

[1]  David Isele,et al.  Analyzing Knowledge Transfer in Deep Q-Networks for Autonomously Handling Multiple Intersections , 2017, ArXiv.

[2]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[3]  Joshué Pérez,et al.  Fuzzy logic steering control of autonomous vehicles inside roundabouts , 2015, Appl. Soft Comput..

[4]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[5]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[10]  John M. Dolan,et al.  Automatically Generated Curriculum based Reinforcement Learning for Autonomous Vehicles in Urban Environment , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  David Isele,et al.  Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[13]  David N. Lee,et al.  A Theory of Visual Control of Braking Based on Information about Time-to-Collision , 1976, Perception.

[14]  Peter Stone,et al.  Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[15]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[17]  David Isele,et al.  Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[19]  John M. Dolan,et al.  POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[20]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[21]  Vicente Milanés Montero,et al.  Controller for Urban Intersections Based on Wireless Communications and Fuzzy Logic , 2010, IEEE Transactions on Intelligent Transportation Systems.

[22]  Pravesh Ranchod,et al.  Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[23]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[24]  Rüdiger Dillmann,et al.  Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[25]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[26]  John M. Dolan,et al.  Traffic interaction in the urban challenge: Putting boss on its best behavior , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Zhi-yi Huang,et al.  Car-following theory of steady-state traffic flow using time-to-collision , 2011 .