Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning

Abstract Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent the main trend in future intelligent transportation systems. This paper concentrates on the planning problem of autonomous vehicles in traffic. We model the interaction between the autonomous vehicle and the environment as a stochastic Markov decision process (MDP) and consider the driving style of an expert driver as the target to be learned. The road geometry is taken into consideration in the MDP model in order to incorporate more diverse driving styles. The desired, expert-like driving behavior of the autonomous vehicle is obtained as follows: First, we design the reward function of the corresponding MDP and determine the optimal driving strategy for the autonomous vehicle using reinforcement learning techniques. Second, we collect a number of demonstrations from an expert driver and learn the optimal driving strategy based on data using inverse reinforcement learning. The unknown reward function of the expert driver is approximated using a deep neural-network (DNN). We clarify and validate the application of the maximum entropy principle (MEP) to learn the DNN reward function, and provide the necessary derivations for using the maximum entropy principle to learn a parameterized feature (reward) function. Simulated results demonstrate the desired driving behaviors of an autonomous vehicle using both the reinforcement learning and inverse reinforcement learning techniques.

[1]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[2]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[5]  Patrick Gruber,et al.  Comparison of Feedback Control Techniques for Torque-Vectoring Control of Fully Electric Vehicles , 2014, IEEE Transactions on Vehicular Technology.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Christos Katrakazas,et al.  Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions , 2015 .

[8]  Mohan M. Trivedi,et al.  Video-based lane estimation and tracking for driver assistance: survey, system, and evaluation , 2006, IEEE Transactions on Intelligent Transportation Systems.

[9]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[10]  Amnon Shashua,et al.  Long-term Planning by Short-term Prediction , 2016, ArXiv.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  R. V. Jategaonkar,et al.  Aerodynamic Parameter Estimation from Flight Data Applying Extended and Unscented Kalman Filter , 2010 .

[13]  Youssef A. Ghoneim,et al.  INTEGRATED CHASSIS CONTROL SYSTEM TO ENHANCE VEHICLE STABILITY. , 2000 .

[14]  Trevor Hastie,et al.  Overview of Supervised Learning , 2001 .

[15]  Gabriel Hugh Elkaim,et al.  Contin uous Curvature Path Generation Based on Bezier Curves for Autonomous Vehicles , 2010 .

[16]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[17]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[18]  Thore Graepel,et al.  A Comparison of learning algorithms on the Arcade Learning Environment , 2014, ArXiv.

[19]  Nico Kaempchen,et al.  Strategic Decision-Making Process in Advanced Driver Assistance Systems , 2010 .

[20]  A. Farina,et al.  Tracking a ballistic target: comparison of several nonlinear filters , 2002 .

[21]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[22]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[23]  T. Choi,et al.  Gaussian Process Regression Analysis for Functional Data , 2011 .

[24]  R. Bellman A Markovian Decision Process , 1957 .

[25]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[26]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[27]  Jonathan P. How,et al.  Real-Time Motion Planning With Applications to Autonomous Urban Driving , 2009, IEEE Transactions on Control Systems Technology.

[28]  Francesco Borrelli,et al.  Predictive Active Steering Control for Autonomous Vehicle Systems , 2007, IEEE Transactions on Control Systems Technology.

[29]  Alberto Bemporad,et al.  Vehicle Yaw Stability Control by Coordinated Active Front Steering and Differential Braking in the Tire Sideslip Angles Domain , 2013, IEEE Transactions on Control Systems Technology.

[30]  David J. Cole,et al.  Application of time-variant predictive control to modelling driver steering skill , 2011 .

[31]  Hongliang Yuan,et al.  Autonomous vehicle collision avoidance system using path planning and model-predictive-control-based active front steering and wheel torque control , 2012 .

[32]  Kyongsu Yi,et al.  An investigation into differential braking strategies for vehicle stability control , 2003 .

[33]  Youseok Kou,et al.  Development and Evaluation of Integrated Chassis Control Systems. , 2010 .

[34]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[35]  Panagiotis Tsiotras,et al.  Nonlinear Driver Parameter Estimation and Driver Steering Behavior Analysis for ADAS Using Field Test Data , 2017, IEEE Transactions on Human-Machine Systems.

[36]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[37]  Martial Hebert,et al.  Vision and navigation for the Carnegie-Mellon Navlab , 1988 .

[38]  Marcelo H. Ang,et al.  Perception, Planning, Control, and Coordination for Autonomous Vehicles , 2017 .