Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning

Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex. Current sampling-based methods such as Rapidly Exploring Random Trees (RRTs) are not ideal for this problem because of the high computational cost. Supervised learning methods such as Imitation Learning lack generalization and safety guarantees. To address these problems and in order to ensure a robust framework, we propose a Hierarchical Reinforcement Learning (HRL) structure combined with a Proportional-Integral-Derivative (PID) controller for trajectory planning. HRL helps divide the task of autonomous vehicle driving into sub-goals and supports the network to learn policies for both high-level options and low-level trajectory planner choices. The introduction of sub-goals decreases convergence time and enables the policies learned to be reused for other scenarios. In addition, the proposed planner is made robust by guaranteeing smooth trajectories and by handling the noisy perception system of the ego-car. The PID controller is used for tracking the waypoints, which ensures smooth trajectories and reduces jerk. The problem of incomplete observations is handled by using a Long-Short-Term-Memory (LSTM) layer in the network. Results from the high-fidelity CARLA simulator indicate that the proposed method reduces convergence time, generates smoother trajectories, and is able to handle dynamic surroundings and noisy observations.

[1]  Ming Liu,et al.  Vision-Based Trajectory Planning via Imitation Learning for Autonomous Vehicles , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[2]  Hongbin Zha,et al.  A real-time motion planner with trajectory optimization for autonomous vehicles , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[4]  Jing Zhao,et al.  Human Driver Behavior Prediction based on UrbanFlow* , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[6]  Véronique Berge-Cherfaoui,et al.  A Markov Decision Process-based approach for trajectory planning with clothoid tentacles , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[7]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[8]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[9]  Ching-Yao Chan,et al.  Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement Learning , 2019, ArXiv.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Sebastian Scherer,et al.  Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories , 2018, CoRL.

[12]  Peter Stone,et al.  Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[13]  A. B. Rad,et al.  Lane Change Algorithm for Autonomous Vehicles via Virtual Curvature Method , 2009 .

[14]  Masayoshi Tomizuka,et al.  Deep Imitation Learning for Autonomous Driving in Generic Urban Scenarios with Enhanced Safety , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[16]  John M. Dolan,et al.  Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  John M. Dolan,et al.  POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[18]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[19]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[20]  Carl-Johan Hoel,et al.  Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).