A Distributed Control Method for Urban Networks Using Multi-Agent Reinforcement Learning Based on Regional Mixed Strategy Nash-Equilibrium

Urban network traffic congestion can be caused by disturbances, such as fluctuation and disequilibrium of traffic demand. This paper designs a distributed control method for preventing disturbance-based urban network traffic congestion by integrating Multi-Agent Reinforcement Learning (MARL) and regional Mixed Strategy Nash-Equilibrium (MSNE). To enhance the disturbance-rejection performance of Urban Network Traffic Control (UNTC), a regional MSNE concept is integrated, which models the competitive relationship between each agent and its neighboring agents in order to improve the decision-making process of MARL. The learning rate is enhanced with a self-adaptive ability to avoid a local optimal dilemma; Jensen-Shannon (JS) divergence is utilized to define the learning rate of the modified MARL. A two-way rectangular grid network with nine intersections is modeled via a Cell Transmission Model (CTM). A probability distribution mechanism, which can update the turn ratio of each approach dynamically and discretely, is established to represent the segmented route-decision process of the vehicles. The effectiveness of the proposed control method is evaluated through simulations in the grid network. The results show the influence of major disturbances, such as fluctuation of vehicle arrival rate, fluctuation of traffic demand (e.g. a rapidly rising flow and extreme changes in origin-destination distribution), and disequilibrium of traffic demand (e.g. different arrival flows at each boundary of the urban network), on the performance of the suggested control method. The results can be used to improve the state of the art in order to reduce urban network traffic congestion due to these disturbances.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Thomas L. Thorpe Vehicle Traffic Light Control Using SARSA , 1997 .

[3]  Teuvo Kohonen,et al.  Things you haven't heard about the self-organizing map , 1993, IEEE International Conference on Neural Networks.

[4]  Olfa Mosbahi,et al.  Multiagent Architecture for Distributed Adaptive Scheduling of Reconfigurable Real-Time Tasks With Energy Harvesting Constraints , 2018, IEEE Access.

[5]  Baher Abdulhai,et al.  Reinforcement learning: Introduction to theory and potential for transport applications , 2003 .

[6]  Markos Papageorgiou,et al.  A rolling-horizon quadratic-programming approach to the signal control problem in large-scale conges , 2009 .

[7]  Stephen F. Smith,et al.  Schedule-driven intersection control , 2012 .

[8]  Lucas Barcelos de Oliveira,et al.  Multi-agent Model Predictive Control of Signaling Split in Urban Traffic Networks ∗ , 2010 .

[9]  Michael G.H. Bell,et al.  Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing , 2004 .

[10]  Baher Abdulhai,et al.  An agent-based learning towards decentralized and coordinated traffic signal control , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[11]  Yichuan Jiang Concurrent Collective Strategy Diffusion of Multiagents: The Spatial Model and Case Study , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  I. Bohachevsky,et al.  Finite difference method for numerical computation of discontinuous solutions of the equations of fluid dynamics , 1959 .

[13]  Rehab F. Abdel-Kader,et al.  Optimizing Dynamic Multi-Agent Performance in E-Learning Environment , 2018, IEEE Access.

[14]  Suh-Wen Chiou,et al.  Optimization of Area Traffic Control for Equilibrium Network Flows , 1999, Transp. Sci..

[15]  Zhiyong Du,et al.  Context-Aware Indoor VLC/RF Heterogeneous Network Selection: Reinforcement Learning With Knowledge Transfer , 2018, IEEE Access.

[16]  Chia-Yen Lee,et al.  Mixed-strategy Nash equilibrium in data envelopment analysis , 2018, Eur. J. Oper. Res..

[17]  Baher Abdulhai,et al.  Towards multi-agent reinforcement learning for integrated network of optimal traffic controllers (MARLIN-OTC) , 2010 .

[18]  Chia-Yen Lee Nash-profit efficiency: A measure of changes in market structures , 2016, Eur. J. Oper. Res..

[19]  Ying Wang,et al.  Resilience-Oriented Distribution System Restoration Considering Mobile Emergency Resource Dispatch in Transportation System , 2019, IEEE Access.

[20]  Shengwu Xiong,et al.  A conflict-congestion model for pedestrian-vehicle mixed evacuation based on discrete particle swarm optimization algorithm , 2014, Comput. Oper. Res..

[21]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Dušan Teodorović,et al.  Area-wide urban traffic control: A Bee Colony Optimization approach , 2017 .

[23]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[24]  Satish V. Ukkusuri,et al.  A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework , 2015 .

[25]  Ana L. C. Bazzan,et al.  Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[26]  Josep M. Guerrero,et al.  Multi-Agent System-Based Event-Triggered Hybrid Control Scheme for Energy Internet , 2017, IEEE Access.

[27]  Juan Chen,et al.  Road-Junction Traffic Signal Timing Optimization by an adaptive Particle Swarm Algorithm , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[28]  Chris Wright,et al.  The conceptual structure of traffic jams , 1998 .

[29]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[30]  Li Zhang,et al.  Optimization of Traffic Signal Timings Based on Surrogate Measures of Safety , 2014 .

[31]  P. I. Richards Shock Waves on the Highway , 1956 .

[32]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[33]  Wing-Kwong Wong,et al.  Reinforcement Learning of Robotic Motion with Genetic Programming, Simulated Annealing and Self-Organizing Map , 2011, 2011 International Conference on Technologies and Applications of Artificial Intelligence.

[34]  Yue Quan,et al.  Observer-Based Distributed Fault Detection and Isolation for Heterogeneous Discrete-Time Multi-Agent Systems With Disturbances , 2016, IEEE Access.

[35]  Lamjed Ben Said,et al.  Multi-agent immune networks to control interrupted flow at signalized intersections , 2017 .

[36]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[37]  Ana L. C. Bazzan,et al.  Learning in groups of traffic signals , 2010, Eng. Appl. Artif. Intell..

[38]  Pitu B. Mirchandani,et al.  A REAL-TIME TRAFFIC SIGNAL CONTROL SYSTEM: ARCHITECTURE, ALGORITHMS, AND ANALYSIS , 2001 .

[39]  Fan Zhang,et al.  Evacuation Strategy Optimization Study Based on System Theory , 2019, IEEE Access.

[40]  Nathan H. Gartner,et al.  Implementation of the OPAC adaptive control strategy in a traffic signal network , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[41]  Yichuan Jiang,et al.  Diffusion in Social Networks: A Multiagent Perspective , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[42]  Xiaoliang Ma,et al.  Adaptive Group-based Signal Control by Reinforcement Learning☆ , 2015 .

[43]  J. G. Dai,et al.  Maximum Pressure Policies in Stochastic Processing Networks , 2005, Oper. Res..

[44]  Ziyuan Pu,et al.  Procedure for Determining the Deployment Locations of Variable Speed Limit Signs to Reduce Crash Risks at Freeway Recurrent Bottlenecks , 2019, IEEE Access.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Jing Hu,et al.  Decision Making of Networked Multiagent Systems for Interaction Structures , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[47]  Wang,et al.  Review of road traffic control strategies , 2003, Proceedings of the IEEE.

[48]  Bart De Schutter,et al.  A mesoscopic integrated urban traffic flow-emission model , 2017 .

[49]  Bo Chen,et al.  A Review of the Applications of Agent Technology in Traffic and Transportation Systems , 2010, IEEE Transactions on Intelligent Transportation Systems.

[50]  Shing Chung Josh Wong,et al.  Group-based optimization of a time-dependent TRANSYT traffic model for area traffic control , 2002 .

[51]  Ana L. C. Bazzan,et al.  A review on agent-based technology for traffic and transportation , 2013, The Knowledge Engineering Review.

[52]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[53]  Baher Abdulhai,et al.  Reinforcement learning for true adaptive traffic signal control , 2003 .

[54]  Carlos F. Daganzo,et al.  THE CELL TRANSMISSION MODEL, PART II: NETWORK TRAFFIC , 1995 .

[55]  Indranil Pan,et al.  Fuzzy Bayesian Learning , 2016, IEEE Transactions on Fuzzy Systems.

[56]  Iisakki Kosonen,et al.  Multi-agent fuzzy signal control based on real-time simulation , 2001 .

[57]  R. D. Bretherton,et al.  Optimizing networks of traffic signals in real time-the SCOOT method , 1991 .

[58]  D I Robertson,et al.  "TRANSYT" METHOD FOR AREA TRAFFIC CONTROL , 1969 .

[59]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[60]  Tianshu Chu,et al.  Large-Scale Traffic Grid Signal Control Using Decentralized Fuzzy Reinforcement Learning , 2016, IntelliSys.

[61]  Pravin Varaiya,et al.  Max pressure control of a network of signalized intersections , 2013 .

[62]  Chris Tampère,et al.  A sensitivity-based approach for adaptive decomposition of anticipatory network traffic control , 2016 .