A Reinforcement Learning Approach to Autonomous Decision Making of Intelligent Vehicles on Highways

Autonomous decision making is a critical and difficult task for intelligent vehicles in dynamic transportation environments. In this paper, a reinforcement learning approach with value function approximation and feature learning is proposed for autonomous decision making of intelligent vehicles on highways. In the proposed approach, the sequential decision making problem for lane changing and overtaking is modeled as a Markov decision process with multiple goals, including safety, speediness, smoothness, etc. In order to learn optimized policies for autonomous decision-making, a multiobjective approximate policy iteration (MO-API) algorithm is presented. The features for value function approximation are learned in a data-driven way, where sparse kernel-based features or manifold-based features can be constructed based on data samples. Compared with previous RL algorithms such as multiobjective Q-learning, the MO-API approach uses data-driven feature representation for value and policy approximation so that better learning efficiency can be achieved. A highway simulation environment using a 14 degree-of-freedom vehicle dynamics model was established to generate training data and test the performance of different decision-making methods for intelligent vehicles on highways. The results illustrate the advantages of the proposed MO-API method under different traffic conditions. Furthermore, we also tested the learned decision policy on a real autonomous vehicle to implement overtaking decision and control under normal traffic on highways. The experimental results also demonstrate the effectiveness of the proposed method.

[1]  John M. Dolan,et al.  A robust autonomous freeway driving algorithm , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[2]  Hao Xu,et al.  Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[4]  Robin Schubert,et al.  Evaluating the Utility of Driving: Toward Automated Decision Making Under Uncertainty , 2012, IEEE Transactions on Intelligent Transportation Systems.

[5]  Peter J. Fleming,et al.  Methods for multi-objective optimization: An analysis , 2015, Inf. Sci..

[6]  Mikhail Gordon,et al.  Lane Change and Merge Maneuvers for Connected and Automated Vehicles: A Survey , 2016, IEEE Transactions on Intelligent Vehicles.

[7]  Marek Petrik,et al.  Hybrid least-squares algorithms for approximate policy evaluation , 2009, Machine Learning.

[8]  S. Utyuzhnikov,et al.  Directed search domain: a method for even generation of the Pareto frontier in multiobjective optimization , 2011 .

[9]  S. Ruzika,et al.  Approximation Methods in Multiobjective Programming , 2005 .

[10]  Doreen Eichel,et al.  Adaptive Dynamic Programming For Control Algorithms And Stability , 2016 .

[11]  N. H. C. Yung,et al.  Automated Vehicle Overtaking based on a Multiple-Goal Reinforcement Learning Framework , 2007, 2007 IEEE Intelligent Transportation Systems Conference.

[12]  Derong Liu,et al.  Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[13]  John M. Dolan,et al.  Traffic interaction in the urban challenge: Putting boss on its best behavior , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Javier Alonso,et al.  Longitudinal fuzzy control for autonomous overtaking , 2011, 2011 IEEE International Conference on Mechatronics.

[15]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[16]  Nico Kaempchen,et al.  Highly Automated Driving on Freeways in Real Traffic Using a Probabilistic Framework , 2012, IEEE Transactions on Intelligent Transportation Systems.

[17]  N. H. C. Yung,et al.  A Multiple-Goal Reinforcement Learning Method for Complex Vehicle Overtaking Maneuvers , 2011, IEEE Transactions on Intelligent Transportation Systems.

[18]  Chunming Liu,et al.  A decision-making method for autonomous vehicles based on simulation and reinforcement learning , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[19]  Gerd Wanielik,et al.  A Unified Bayesian Approach for Object and Situation Assessment , 2011, IEEE Intelligent Transportation Systems Magazine.

[20]  Azim Eskandarian,et al.  Handbook of Intelligent Vehicles , 2012 .

[21]  Nicolas Wesner,et al.  Multi-objective optimization via visualization , 2017 .

[22]  Ding Zhao,et al.  Accelerated Evaluation of Automated Vehicles. , 2016 .

[23]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[24]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[25]  Gerd Wanielik,et al.  Situation Assessment for Automatic Lane-Change Maneuvers , 2010, IEEE Transactions on Intelligent Transportation Systems.

[26]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[27]  Xin Li,et al.  Reinforcement learning based overtaking decision-making for highway autonomous driving , 2015, 2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP).

[28]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[29]  Chuanqiang Lian,et al.  Model-Free Multi-Kernel Learning control for nonlinear discrete-Time Systems , 2017, Int. J. Robotics Autom..

[30]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[31]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[32]  John M. Dolan,et al.  A prediction- and cost function-based algorithm for robust autonomous freeway driving , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[33]  R. Langari,et al.  Adaptive Analytic Hierarchy Process-Based Decision Making to Enhance Vehicle Autonomy , 2012, IEEE Transactions on Vehicular Technology.

[34]  Witold Pedrycz,et al.  A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning , 2014, IEEE Transactions on Cybernetics.

[35]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[36]  Pedro Ferreira,et al.  An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Markus Maurer,et al.  Probabilistic online POMDP decision making for lane changes in fully automated driving , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[38]  Xin Xu,et al.  Reinforcement learning with automatic basis construction based on isometric feature mapping , 2014, Inf. Sci..

[39]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[40]  Majid Nili Ahmadabadi,et al.  Model-Based and Learning-Based Decision Making in Incomplete Information Cournot Games: A State Estimation Approach , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[41]  Sarangapani Jagannathan,et al.  Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.