Multi-Agent Deep Reinforcement Learning for Urban Traffic Light Control in Vehicular Networks

As urban traffic condition is diverse and complicated, applying reinforcement learning to reduce traffic congestion becomes one of the hot and promising topics. Especially, how to coordinate the traffic light controllers of multiple intersections is a key challenge for multi-agent reinforcement learning (MARL). Most existing MARL studies are based on traditional $Q$-learning, but unstable environment leads to poor learning in the complicated and dynamic traffic scenarios. In this paper, we propose a novel multi-agent recurrent deep deterministic policy gradient (MARDDPG) algorithm based on deep deterministic policy gradient (DDPG) algorithm for traffic light control (TLC) in vehiclar networks. Specifically, the centralized learning in each critic network enables each agent to estimate the policies of other agents in the decision-making process and each agent can coordinate with each other, alleviating the problem of poor learning performance caused by environmental instability. The decentralized execution enables each agent to make decisions independently. We share parameters in actor networks to speed up the training process and reduce the memory footprint. The addition of LSTM is beneficial to alleviate the instability of the environment caused by partial observable state. We utilize surveillance cameras and vehicular networks to collect status information for each intersection. Unlike previous work, we have not only considered the vehicle but also considered the pedestrians waiting to pass through the intersection. Moreover, we also set different priorities for buses and ordinary vehicles. The experimental results in a vehicular network show that our method can run stably in various scenarios and coordinate multiple intersections, which significantly reduces vehicle congestion and pedestrian congestion.

[1]  Suvrajeet Sen,et al.  Controlled Optimization of Phases at an Intersection , 1997, Transp. Sci..

[2]  Carlos Gershenson,et al.  Self-organizing traffic lights: A realistic simulation , 2006, Advances in Applied Self-organizing Systems.

[3]  R D Bretherton,et al.  THE SCOOT ON-LINE TRAFFIC SIGNAL OPTIMISATION TECHNIQUE , 1982 .

[4]  Frans A. Oliehoek,et al.  Coordinated Deep Reinforcement Learners for Traffic Light Control , 2016 .

[5]  Nathan H. Gartner,et al.  OPAC: A DEMAND-RESPONSIVE STRATEGY FOR TRAFFIC SIGNAL CONTROL , 1983 .

[6]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[8]  Noe Casas,et al.  Deep Deterministic Policy Gradient for Urban Traffic Light Control , 2017, ArXiv.

[9]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Zhenhui Li,et al.  IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control , 2018, KDD.

[12]  Yun He,et al.  Distributed Cooperative Reinforcement Learning-Based Traffic Signal Control That Integrates V2X Networks’ Dynamic Clustering , 2017, IEEE Transactions on Vehicular Technology.

[13]  Zhiyong Liu,et al.  A Survey of Intelligence Methods in Urban Traffic Signal Control , 2007 .

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[17]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[18]  Ella Bingham Reinforcement learning in neurofuzzy traffic signal control , 2001, Eur. J. Oper. Res..

[19]  Daniel Krajzewicz,et al.  SUMO (Simulation of Urban MObility) - an open-source traffic simulation , 2002 .

[20]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[21]  Liu Ying,et al.  Intelligent Traffic Light Control Using Distributed Multi-agent Q Learning , 2017, ITSC 2017.

[22]  Yi Zhang,et al.  Adaptive Traffic Signal Control with Deep Recurrent Q-learning , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[23]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[24]  Sem C. Borst,et al.  Deep Reinforcement Learning for Intelligent Transportation Systems , 2018, ArXiv.

[25]  Javier J. Sanchez-Medina,et al.  Traffic Signal Optimization in “La Almozara” District in Saragossa Under Congestion Conditions, Using Genetic Algorithms, Traffic Microsimulation, and Cluster Computing , 2010, IEEE Transactions on Intelligent Transportation Systems.

[26]  Saiedeh N. Razavi,et al.  Using a Deep Reinforcement Learning Agent for Traffic Signal Control , 2016, ArXiv.

[27]  Mengqi Liu,et al.  Cooperative Deep Reinforcement Learning for Tra ic Signal Control , 2017 .

[28]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[29]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  F. Webster TRAFFIC SIGNAL SETTINGS , 1958 .

[32]  Markus Reischl,et al.  Distributed traffic light control at uncoupled intersections with real-world topology by deep reinforcement learning , 2018, NIPS 2018.

[33]  J. Laird,et al.  Sensitivity analysis of traffic congestion costs in a network under a charging policy , 2015 .

[34]  Hannes Hartenstein,et al.  A tutorial survey on vehicular ad hoc networks , 2008, IEEE Communications Magazine.

[35]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[36]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Lei Liu,et al.  Intelligent traffic light control using distributed multi-agent Q learning , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[38]  Mohsen Guizani,et al.  Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[41]  Matthew J. Hausknecht,et al.  Cooperation and communication in multiagent deep reinforcement learning , 2016 .

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Shalabh Bhatnagar,et al.  Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).