Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs

Since traffic jams are ubiquitous in the modern world, optimizing the behavior of traffic lights for efficient traffic flow is a critically important goal. Though most current traffic lights use simple heuristic protocols, more efficient controllers can be discovered automatically via multiagent reinforcement learning, where each agent controls a single traffic light. However, in previous work on this approach, agents select only locally optimal actions without coordinating their behavior. This paper extends this approach to include explicit coordination between neighboring traffic lights. Coordination is achieved using the max-plus algorithm, which estimates the optimal joint action by sending locally optimized messages among connected agents. This paper presents the first application of max-plus to a large-scale problem and thus verifies its efficacy in realistic settings. It also provides empirical evidence that max-plus performs well on cyclic graphs, though it has been proven to converge only for tree-structured graphs. Furthermore, it provides a new understanding of the properties a traffic network must have for such coordination to be beneficial and shows that max-plus outperforms previous methods on networks that possess those properties.

[1]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[2]  Avi Pfeffer,et al.  Loopy Belief Propagation as a Basis for Communication in Sensor Networks , 2002, UAI.

[3]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[4]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[5]  S. Chiu,et al.  Adaptive traffic signal control using fuzzy logic , 1992, Proceedings of the Intelligent Vehicles `92 Symposium.

[6]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[7]  D. C. Chin,et al.  Traffic-responsive signal timing for system-wide traffic control , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[8]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[9]  Baher Abdulhai,et al.  Reinforcement learning for true adaptive traffic signal control , 2003 .

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[12]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[13]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[14]  Ma Shou Agent-based learning control method for urban traffic signal of single intersection , 2002 .

[15]  David E. Goldberg,et al.  SIGNAL TIMING DETERMINATION USING GENETIC ALGORITHMS , 1992 .

[16]  E.H.J. Nijhuis,et al.  Cooperative multi-agent reinforcement learning of traffic lights , 2005 .

[17]  Thomas L. Thorpe,et al.  Traac Light Control Using Sarsa with Three State Representations , 1996 .

[18]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .