Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.

[1]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[2]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[3]  Marios M. Polycarpou,et al.  Distributed Traffic Signal Control Using the Cell Transmission Model via the Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Intelligent Transportation Systems.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[6]  Arne Koopman,et al.  Intelligent Traffic Light Control , 2004 .

[7]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8]  Shalabh Bhatnagar,et al.  Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[9]  Marco Wiering,et al.  Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events , 2017 .

[10]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Mee Hong Ling,et al.  A Survey on Reinforcement Learning Models and Algorithms for Traffic Signal Control , 2017, ACM Comput. Surv..

[12]  José García-Nieto,et al.  Swarm intelligence for traffic light scheduling: Application to real urban areas , 2012, Eng. Appl. Artif. Intell..

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Shahaboddin Shamshirband,et al.  A distributed approach for coordination between traffic lights based on game theory , 2012, Int. Arab J. Inf. Technol..

[15]  Kagan Tumer,et al.  Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[16]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[17]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Xiaoliang Ma,et al.  Adaptive Group-based Signal Control by Reinforcement Learning☆ , 2015 .

[20]  Michael G.H. Bell,et al.  Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing , 2004 .

[21]  Pravin Varaiya,et al.  Max pressure control of a network of signalized intersections , 2013 .

[22]  Zhenhui Li,et al.  IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control , 2018, KDD.

[23]  Jun Wang,et al.  Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.

[24]  Qionghai Dai,et al.  Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control , 2020, IEEE Transactions on Cybernetics.

[25]  Monireh Abdoos,et al.  Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[26]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[27]  David P. Landau,et al.  Phase transitions and critical phenomena , 1989, Computing in Science & Engineering.

[28]  Jie Wang,et al.  Large-scale traffic grid signal control with regional Reinforcement Learning , 2016, 2016 American Control Conference (ACC).

[29]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[30]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[31]  Tianshu Chu,et al.  Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[32]  Jeffrey Smyth,et al.  United States. Federal Highway Administration , 2017 .

[33]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[34]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[35]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[36]  Jie Gao,et al.  Two-Stage Fuzzy Logic Controller for Signalized Intersection , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[37]  Q. H. Wu,et al.  Optimal Bidding Strategies in Electricity Markets Using Reinforcement Learning , 2004 .

[38]  Jérôme Härri,et al.  Monaco SUMO Traffic (MoST) Scenario: A 3D Mobility Scenario for Cooperative ITS , 2018 .

[39]  Robert L. Winkler,et al.  The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..

[40]  Cade Braud,et al.  Traffic signal timing manual. , 2008 .

[41]  Abdellah El Moudni,et al.  Traffic network micro-simulation model and control algorithm based on approximate dynamic programming , 2016 .

[42]  Kenneth Tze Kin Teo,et al.  Agent-Based Traffic Flow Optimization at Multiple Signalized Intersections , 2014, 2014 8th Asia Modelling Symposium.

[43]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[44]  Noe Casas,et al.  Deep Deterministic Policy Gradient for Urban Traffic Light Control , 2017, ArXiv.

[45]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[46]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[47]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[48]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.