Development of an Efficient Driving Strategy for Connected and Automated Vehicles at Signalized Intersections: A Reinforcement Learning Approach

The concept of Connected and Automated Vehicles (CAVs) enables instant traffic information to be shared among vehicle networks. With this newly proposed concept, a vehicle’s driving behaviour will no longer be solely based on the driver’s limited and incomplete observation. By taking advantages of the shared information, driving behaviours of CAVs can be improved greatly to a more responsible, accurate and efficient level. This study proposed a reinforcement-learning-based car following model for CAVs in order to obtain an appropriate driving behaviour to improve travel efficiency, fuel consumption and safety at signalized intersections in real-time. The result shows that by specifying an effective reward function, a controller can be learned and works well under different traffic demands as well as traffic light cycles with different durations. This study reveals a great potential of emerging reinforcement learning technologies in transport research and applications.

[1]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[2]  Xavier Litrico,et al.  Prediction of traffic convective instability with spectral analysis of the Aw-Rascle-Zhang model , 2015 .

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Steven E Shladover,et al.  Impacts of Cooperative Adaptive Cruise Control on Freeway Traffic Flow , 2012 .

[5]  Meng Wang,et al.  Rolling horizon control framework for driver assistance systems. Part II: Cooperative sensing and cooperative control , 2014 .

[6]  Alexandre M. Bayen,et al.  Emergent Behaviors in Mixed-Autonomy Traffic , 2017, CoRL.

[7]  Hongliang Guo,et al.  A Unified Framework for Vehicle Rerouting and Traffic Light Control to Reduce Traffic Congestion , 2017, IEEE Transactions on Intelligent Transportation Systems.

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jie Zhang,et al.  Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method , 2017, AAAI.

[10]  Sumit Roy,et al.  High-Performance Vehicle Streams: Communication and Control Architecture , 2014, IEEE Transactions on Vehicular Technology.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Soyoung Ahn,et al.  A behavioural car-following model that captures traffic oscillations , 2012 .

[13]  Xiaobo Qu,et al.  On the Impact of Cooperative Autonomous Vehicles in Improving Freeway Merging: A Modified Intelligent Driver Model-Based Approach , 2017, IEEE Transactions on Intelligent Transportation Systems.

[14]  Y. Sugiyama,et al.  Traffic jams without bottlenecks—experimental evidence for the physical mechanism of the formation of a jam , 2008 .

[15]  Alexandre M. Bayen,et al.  Stabilizing Traffic with Autonomous Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Vicente Milanés Montero,et al.  Cooperative Adaptive Cruise Control in Real Traffic Situations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[17]  Matthijs T. J. Spaan,et al.  Traffic flow optimization: A reinforcement learning approach , 2016, Eng. Appl. Artif. Intell..

[18]  Dirk Helbing,et al.  Autonomous Detection and Anticipation of Jam Fronts from Messages Propagated by Intervehicle Communication , 2007 .

[19]  M.M. Balas,et al.  Driver Assisting by Inverse Time to Collision , 2006, 2006 World Automation Congress.

[20]  Jia Hu,et al.  Parsimonious shooting heuristic for trajectory design of connected automated traffic part II: Computational issues and optimization , 2017 .

[21]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Yanfeng Ouyang,et al.  Prediction and Field Validation of Traffic Oscillation Propagation Under Nonlinear Car-Following Laws , 2012 .

[23]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[24]  Fang Zhou,et al.  Parsimonious shooting heuristic for trajectory control of connected automated traffic part I: Theoretical analysis with generalized time geography , 2015, ArXiv.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[27]  Lily Elefteriadou,et al.  Efficient control of fully automated connected vehicles at freeway merge segments , 2017 .

[28]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[31]  Jie Zhang,et al.  Multiagent-Based Route Guidance for Increasing the Chance of Arrival on Time , 2016, AAAI.

[32]  Alejandra Medina Flintsch,et al.  A rule-based neural network approach to model driver naturalistic behavior in traffic , 2013 .

[33]  R. Mortimer,et al.  Drivers' estimates of time to collision. , 1994, Accident; analysis and prevention.

[34]  Tao Li,et al.  Modeling Uncertainty in Vehicle Trajectory Prediction in a Mixed Connected and Autonomous Vehicle Environment using Deep Learning and Kernel Density Estimation , 2018 .

[35]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Xu Yang,et al.  Simulation studies of information propagation in a self-organizing distributed traffic information system , 2005 .

[38]  X. Qu,et al.  On the Stochastic Fundamental Diagram for Freeway Traffic: Model Development, Analytical Properties, Validation, and Extensive Applications , 2017 .

[39]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[40]  Alexandre M. Bayen,et al.  Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[43]  Fei-Yue Wang,et al.  An Efficient Deep Reinforcement Learning Model for Urban Traffic Control , 2018, ArXiv.

[44]  Bart van Arem,et al.  The Impact of Cooperative Adaptive Cruise Control on Traffic-Flow Characteristics , 2006, IEEE Transactions on Intelligent Transportation Systems.

[45]  Steven E Shladover,et al.  Review of Variable Speed Limits and Advisories , 2014 .

[46]  Steven E Shladover,et al.  Modeling cooperative and autonomous adaptive cruise control dynamic responses using experimental data , 2014 .

[47]  Xiaopeng Li,et al.  Stop-and-go traffic analysis: Theoretical properties, environmental impacts and oscillation mitigation , 2014 .

[48]  Vicente Milanés Montero,et al.  Handling Cut-In Vehicles in Strings of Cooperative Adaptive Cruise Control Vehicles , 2016, J. Intell. Transp. Syst..

[49]  Hesham Rakha,et al.  ESTIMATING VEHICLE FUEL CONSUMPTION AND EMISSIONS BASED ON INSTANTANEOUS SPEED AND ACCELERATION LEVELS , 2002 .

[50]  Charles Desjardins,et al.  Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach , 2011, IEEE Transactions on Intelligent Transportation Systems.

[51]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[52]  J.-C. Cano,et al.  Predicting Traffic lights to Improve Urban Traffic Fuel Consumption , 2006, 2006 6th International Conference on ITS Telecommunications.

[53]  Chuck Fuhs,et al.  Synthesis of Active Traffic Management Experiences in Europe and the United States , 2010 .

[54]  Dirk Helbing,et al.  Coupled vehicle and information flows: Message transport on a dynamic vehicle network , 2006 .

[55]  Maarten Steinbuch,et al.  String-Stable CACC Design and Experimental Validation: A Frequency-Domain Approach , 2010, IEEE Transactions on Vehicular Technology.

[56]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[57]  Madjid Tavana,et al.  Autonomous vehicles: challenges, opportunities, and future implications for transportation policies , 2016, Journal of Modern Transportation.

[58]  Xiubin Wang,et al.  Modeling the process of information relay through inter-vehicle communication , 2007 .

[59]  Feng Zhu,et al.  Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility , 2018, J. Intell. Transp. Syst..