Multi-Agent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC)

Traffic congestion in Greater Toronto Area costs Canada $ 6 billion /year and is expected to grow up to $ 15 billion /year in the next few decades. Adaptive Traffic Signal Control(ATSC) is a promising technique to alleviate traffic congestion. For medium-large transportation networks, coordinated ATSC is becoming a challenging problem because the number of system states and actions grows exponentially as the number of networked intersections grows. Efficient and robust controllers can be designed using a multi-agent reinforcement learning (MARL) approach in which each controller (agent) is responsible for the control of traffic lights around a single traffic junction. This paper presents a novel, decentralized and coordinated adaptive real-time traffic signal control system using Multi-Agent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLINATSC) that aims to minimize the total vehicle delay in the traffic network. The system is tested using microscopic traffic simulation software (PARAMICS) on a network of 5 signalized intersections in Downtown Toronto. The performance of MARLIN-ATSC is compared against two approaches: the conventional pretimed signal control (B1) and independent RL-based control agents (B2), i.e. with no coordination. The results show that network-wide average delay savings range from 32% to 63% relative to B1 and from 7% to 12% relative to B2 under different demand levels and arrival profiles.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  A G Sims,et al.  SCAT the Sydney coordinated adaptive traffic system: philosophy and benefits , 1979 .

[3]  Ana L. C. Bazzan,et al.  A Distributed Approach for Coordination of Traffic Signal Agents , 2004, Autonomous Agents and Multi-Agent Systems.

[4]  Antônio de Pádua Braga,et al.  Reinforcement learning of a simple control task using the spike response model , 2006, Neurocomputing.

[5]  Baher Abdulhai,et al.  Reinforcement learning: Introduction to theory and potential for transport applications , 2003 .

[6]  J.-J. Henry PRODYN tests and future experiments on ZELT , 1989, Conference Record of papers presented at the First Vehicle Navigation and Information Systems Conference (VNIS '89).

[7]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[8]  J Marchand,et al.  IMPROVED OPERATION OF URBAN TRANSPORTATION SYSTEMS. VOLUME I TRAFFIC SIGNAL CONTROL STRATEGIES - A STATE OF THE ART , 1974 .

[9]  Vinny Cahill,et al.  A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Philip J Tarnoff,et al.  DEVELOPMENT OF ADVANCED TRAFFIC SIGNAL CONTROL STRATEGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS: MULTILEVEL DESIGN , 1995 .

[12]  Matthijs T. J. Spaan,et al.  High level coordination of agents based on multiagent Markov decision processes with roles , 2002 .

[13]  Suvrajeet Sen,et al.  Controlled Optimization of Phases at an Intersection , 1997, Transp. Sci..

[14]  I J Fullerton,et al.  DEVELOPMENT AND TESTING OF ADVANCED CONTROL STRATEGIES IN THE URBAN TRAFFIC CONTROL SYSTEM , 1979 .

[15]  Matthew Barth,et al.  Modal Emissions Modeling: A Physical Approach , 1996 .

[16]  Jin Yu,et al.  Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[17]  Alexander Skabardonis,et al.  Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software , 2004 .

[18]  Lin Zhang,et al.  Signal Control for Oversaturated Intersections Using Fuzzy Logic , 2008 .

[19]  Baher Abdulhai,et al.  Reinforcement learning for true adaptive traffic signal control , 2003 .

[20]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[21]  J Y Luk,et al.  TRANSYT: traffic network study tool , 1990 .

[22]  Arne Koopman,et al.  Intelligent Traffic Light Control , 2004 .

[23]  Jean-Loup Farges,et al.  Decentralization vs Hierarchy in Optimal Traffic Control , 1987 .

[24]  J. Doob Stochastic processes , 1953 .

[25]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[26]  R. D. Bretherton,et al.  AGEING OF FIXED-TIME TRAFFIC SIGNAL PLANS , 1986 .

[27]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[28]  Kenji Fukumoto,et al.  Multi-agent Reinforcement Learning: A Modular Approach , 1996 .

[29]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[30]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[31]  Thomas L. Thorpe Vehicle Traffic Light Control Using SARSA , 1997 .

[32]  Bernhard Friedrich,et al.  Data Fusion Techniques for Adaptive Traffic Signal Control , 2003 .

[33]  Mohamed Wahba,et al.  MILATRAS: MIcrosimulation Learning-based Approach to TRansit ASsignment , 2009 .

[34]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[35]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[36]  Nikos A. Vlassis,et al.  Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[37]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[38]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[39]  Chen-Khong Tham,et al.  Coordinated Reinforcement Learning for Decentralized Optimal Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[40]  Martin J. Osborne,et al.  An Introduction to Game Theory , 2003 .

[41]  John D. C. Little,et al.  MAXBAND: A PROGRAM FOR SETTING SIGNALS ON ARTERIES AND TRIANGULAR NETWORKS , 1981 .

[42]  Pitu B. Mirchandani,et al.  HIERARCHICAL FRAMEWORK FOR REAL-TIME TRAFFIC CONTROL , 1992 .

[43]  Baher Abdulhai,et al.  An agent-based learning towards decentralized and coordinated traffic signal control , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[44]  Lakhmi C. Jain,et al.  Experimental analysis on Sarsa(lambda) and Q(lambda) with different eligibility traces strategies , 2009, J. Intell. Fuzzy Syst..

[45]  Jean-Loup Farges,et al.  THE PRODYN REAL TIME TRAFFIC ALGORITHM , 1983 .

[46]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[47]  Baher Abdulhai,et al.  Towards multi-agent reinforcement learning for integrated network of optimal traffic controllers (MARLIN-OTC) , 2010 .

[48]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[49]  Shoufeng Lu,et al.  Incremental multistep Q-learning for adaptive traffic signal control based on delay minimization strategy , 2008, WCICA 2008.

[50]  L Salsberg,et al.  The Big Move: Transforming Transportation in the Greater Toronto and Hamilton Area - Modelling and Measuring Against the Triple Bottom Line , 2009 .

[51]  Zong Tian,et al.  MODELS FOR QUANTTITATIVE ASSESSMENTS OF VIDEO DETECTION SYSTEM IMPACTS ON SIGNALIZED INTERSECTION OPERATIONS , 2006 .

[52]  Jean-Loup Farges,et al.  PRODYN: on site evaluation , 1990 .

[53]  P B Mirchandani,et al.  RHODES-ITMS TEMPE FIELD TEST PROJECT: IMPLEMENTATION AND FIELD TESTING OF RHODES, A REAL-TIME TRAFFIC ADAPTIVE CONTROL SYSTEM , 2001 .

[54]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[55]  Nathan H. Gartner,et al.  OPAC: A DEMAND-RESPONSIVE STRATEGY FOR TRAFFIC SIGNAL CONTROL , 1983 .

[56]  Ana L. C. Bazzan,et al.  Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[57]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[58]  K Wood URBAN TRAFFIC CONTROL : SYSTEMS REVIEW , 1993 .

[59]  Tao Li,et al.  Adaptive Dynamic Programming for Multi-intersections Traffic Signal Intelligent Control , 2008, 2008 11th International IEEE Conference on Intelligent Transportation Systems.

[60]  Nathan H. Gartner,et al.  MULTIBAND--A VARIABLE-BANDWIDTH ARTERIAL PROGRESSION SCHEME , 1990 .

[61]  Eduardo Camponogara,et al.  Distributed Learning Agents in Urban Traffic Control , 2003, EPIA.

[62]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[63]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[64]  Nathan H. Gartner,et al.  Implementation of the OPAC adaptive control strategy in a traffic signal network , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[65]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[66]  Sophie Midenet,et al.  The real-time urban traffic control system CRONOS: Algorithm and experiments , 2006 .

[67]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[68]  Elliot Mendelson Introducing Game Theory and its Applications , 2004 .

[69]  Celine Jacob Optimal, integrated and adaptive traffic corridor control: a machine learning approach , 2005 .

[70]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[71]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[72]  S. Bottoms Utopia , 2013 .

[73]  Lu Shoufeng,et al.  Q-Learning for Adaptive Traffic Signal Control Based on Delay Minimization Strategy , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.

[74]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[75]  R D Bretherton,et al.  SCOOT-a Traffic Responsive Method of Coordinating Signals , 1981 .

[76]  Aleksandar Stevanovic,et al.  Adaptive Traffic Control Systems: Domestic and Foreign State of Practice , 2010 .

[77]  Markos Papageorgiou,et al.  A Multivariable Regulator Approach to Traffic-Responsive Network-Wide Signal Control , 2000 .

[78]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[79]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[80]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[81]  F. Webster TRAFFIC SIGNAL SETTINGS , 1958 .

[82]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[83]  Pitu B. Mirchandani,et al.  A REAL-TIME TRAFFIC SIGNAL CONTROL SYSTEM: ARCHITECTURE, ALGORITHMS, AND ANALYSIS , 2001 .

[84]  William R. McShane,et al.  A review of pedestrian safety models for urban areas in Low and Middle Income Countries , 2016 .

[85]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[86]  Nathan H. Gartner,et al.  A multi-band approach to arterial traffic signal optimization , 1991 .

[87]  Nathan H. Gartner,et al.  Traffic Flow Theory - A State-of-the-Art Report: Revised Monograph on Traffic Flow Theory , 2002 .

[88]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[89]  Rahim F Benekohal,et al.  Q-learning and Approximate Dynamic Programming for Traffic Control: A Case Study for an Oversaturated Network , 2012 .

[90]  E.H.J. Nijhuis,et al.  Cooperative multi-agent reinforcement learning of traffic lights , 2005 .

[91]  Paul W. Goldberg,et al.  The Complexity of Computing a Nash Equilibrium , 2009, SIAM J. Comput..

[92]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[93]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[94]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[95]  Jeffrey W. Buckholz THE REAL-TIME ESTIMATION OF DELAY AT SIGNALIZED INTERSECTIONS , 2008 .

[96]  James W Clark,et al.  Evaluation of New Jersey Route 18 OPAC/MIST Traffic-Control System , 1997 .

[97]  Bo Chen,et al.  A Review of the Applications of Agent Technology in Traffic and Transportation Systems , 2010, IEEE Transactions on Intelligent Transportation Systems.