论文信息 - Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.

[1] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[2] T. Urbanik,et al. Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[3] Marios M. Polycarpou,et al. Distributed Traffic Signal Control Using the Cell Transmission Model via the Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Intelligent Transportation Systems.

[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[5] Baher Abdulhai,et al. Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[6] Arne Koopman,et al. Intelligent Traffic Light Control , 2004 .

[7] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8] Shalabh Bhatnagar,et al. Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[9] Marco Wiering,et al. Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events , 2017 .

[10] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[11] Mee Hong Ling,et al. A Survey on Reinforcement Learning Models and Algorithms for Traffic Signal Control , 2017, ACM Comput. Surv..

[12] José García-Nieto,et al. Swarm intelligence for traffic light scheduling: Application to real urban areas , 2012, Eng. Appl. Artif. Intell..

[13] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[14] Shahaboddin Shamshirband,et al. A distributed approach for coordination between traffic lights based on game theory , 2012, Int. Arab J. Inf. Technol..

[15] Kagan Tumer,et al. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[16] Shalabh Bhatnagar,et al. Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[17] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Xiaoliang Ma,et al. Adaptive Group-based Signal Control by Reinforcement Learning☆ , 2015 .

[20] Michael G.H. Bell,et al. Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing , 2004 .

[21] Pravin Varaiya,et al. Max pressure control of a network of signalized intersections , 2013 .

[22] Zhenhui Li,et al. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control , 2018, KDD.

[23] Jun Wang,et al. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.

[24] Qionghai Dai,et al. Cooperative Deep Reinforcement Learning for Large-Scale Traffic Grid Signal Control , 2020, IEEE Transactions on Cybernetics.

[25] Monireh Abdoos,et al. Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[26] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[27] David P. Landau,et al. Phase transitions and critical phenomena , 1989, Computing in Science & Engineering.

[28] Jie Wang,et al. Large-scale traffic grid signal control with regional Reinforcement Learning , 2016, 2016 American Control Conference (ACC).

[29] Sergey Levine,et al. Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[30] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[31] Tianshu Chu,et al. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[32] Jeffrey Smyth,et al. United States. Federal Highway Administration , 2017 .

[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[34] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[35] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[36] Jie Gao,et al. Two-Stage Fuzzy Logic Controller for Signalized Intersection , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[37] Q. H. Wu,et al. Optimal Bidding Strategies in Electricity Markets Using Reinforcement Learning , 2004 .

[38] Jérôme Härri,et al. Monaco SUMO Traffic (MoST) Scenario: A 3D Mobility Scenario for Cooperative ITS , 2018 .

[39] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..

[40] Cade Braud,et al. Traffic signal timing manual. , 2008 .

[41] Abdellah El Moudni,et al. Traffic network micro-simulation model and control algorithm based on approximate dynamic programming , 2016 .

[42] Kenneth Tze Kin Teo,et al. Agent-Based Traffic Flow Optimization at Multiple Signalized Intersections , 2014, 2014 8th Asia Modelling Symposium.

[43] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[44] Noe Casas,et al. Deep Deterministic Policy Gradient for Urban Traffic Light Control , 2017, ArXiv.

[45] Shimon Whiteson,et al. Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[46] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[47] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[48] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.