Towards multi-agent reinforcement learning for integrated network of optimal traffic controllers (MARLIN-OTC)

Abstract Traffic congestion can be alleviated by infrastructure expansions; however, improving the existing infrastructure using traffic control is more plausible due to the obvious financial resources and physical space constraints. The most promising control tools include ramp metering, variable message signs, and signalized intersections. Synergizing the aforementioned strategies in one platform is an ultimate and challenging goal to alleviate traffic gridlock and optimally utilize the existing system capacity; this is referred to as Integrated Traffic Control (ITC). Reinforcement Learning (RL) techniques have the potential to tackle the optimal traffic control problem. Game Theory (GT) fits well in modelling the distributed control systems as multiplayer games. Multi-Agent Reinforcement Learning (MARL) achieves the potential synergy of RL and GT concepts, providing a promising tool for optimal distributed traffic control. The objective of this paper is to clarify the opportunities of game theory concepts and MARL approaches in creating an adaptive optimal traffic control system that is decentralized but yet integrated through agents' interactions. In this paper, we comparatively review and evaluate the relevant existing approaches. We then envision and introduce a novel framework that combines GT concepts and MARL to achieve a Multi-Agent Reinforcement Learning for Integrated Network of Optimal Traffic Controllers (MARLIN-OTC).

[1]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[2]  Ana L. C. Bazzan,et al.  A Distributed Approach for Coordination of Traffic Signal Agents , 2004, Autonomous Agents and Multi-Agent Systems.

[3]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[4]  Bram Bakker,et al.  Reinforcement Learning of Traffic Light Controllers Adapting to Traffic Congestion , 2005, BNAIC.

[5]  Craig Boutilier,et al.  Bayesian reinforcement learning for coalition formation under uncertainty , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[6]  Zhiyong Liu,et al.  A Survey of Intelligence Methods in Urban Traffic Signal Control , 2007 .

[7]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  Jin Yu,et al.  Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[10]  Yumei Zhang,et al.  A Machine Learning Method for Dynamic Traffic Control and Guidance on Freeway Networks , 2009, 2009 International Asia Conference on Informatics in Control, Automation and Robotics.

[11]  Jean-François Laslier,et al.  A reinforcement learning process in extensive form games , 2005, Int. J. Game Theory.

[12]  Alessandro Lazaric,et al.  Reinforcement learning in extensive form games with incomplete information: the bargaining case study , 2007, AAMAS '07.

[13]  Markos Papageorgiou,et al.  ALINEA: A LOCAL FEEDBACK CONTROL LAW FOR ON-RAMP METERING , 1990 .

[14]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[15]  Baher Abdulhai,et al.  Reinforcement learning for true adaptive traffic signal control , 2003 .

[16]  A. Rahimi-Kian,et al.  A game theory approach to optimal coordinated ramp metering and variable speed limits , 2008, 2008 Chinese Control and Decision Conference.

[17]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[18]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  Katia P. Sycara,et al.  Multi-agent learning in extensive games with complete information , 2003, AAMAS '03.

[21]  Vinny Cahill,et al.  A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[22]  David Carmel,et al.  Opponent Modeling in Multi-Agent Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[23]  Felix A. Fischer,et al.  Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[24]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[25]  A.G. Sims,et al.  The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits , 1980, IEEE Transactions on Vehicular Technology.

[26]  Baher Abdulhai,et al.  Reinforcement learning: Introduction to theory and potential for transport applications , 2003 .

[27]  Ahmad Afshar,et al.  Multiagent Reniforcement Learning in Extensive Form Games with Perfect Information , 2009 .

[28]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[29]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[30]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[31]  Ana L. C. Bazzan,et al.  Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[32]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[33]  Ella Bingham Reinforcement learning in neurofuzzy traffic signal control , 2001, Eur. J. Oper. Res..

[34]  Anatol Rapoport,et al.  Theories of Coalition Formation , 1998 .

[35]  Markos Papageorgiou,et al.  Automatic Control Methods in Traffic and Transportation , 1998 .

[36]  Zhenlong Li Optimal Coordination of Variable Speed and Ramp Metering Based on Stackelberg Game , 2005 .

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[39]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[40]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[41]  Hiroshi Matsuo,et al.  Multiagent reinforcement learning with the partly high-dimensional state space , 2006, Systems and Computers in Japan.

[42]  Eduardo Camponogara,et al.  Distributed Learning Agents in Urban Traffic Control , 2003, EPIA.

[43]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[45]  Leen-Kiat Soh,et al.  Investigating reinforcement learning in multiagent coalition formation , 2004, AAAI 2004.

[46]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[47]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[48]  Yukinori Kakazu,et al.  Genetic reinforcement learning for cooperative traffic signal control , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[49]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .