Multi-Agent Reinforcement Learning for Dynamic Routing Games: A Unified Paradigm

This paper aims to develop a unified paradigm that models one's learning behavior and the system's equilibrating processes in a routing game among atomic selfish agents. Such a paradigm can assist policymakers in devising optimal operational and planning countermeasures under both normal and abnormal circumstances. To this end, a multi-agent reinforcement learning (MARL) paradigm is proposed in which each agent learns and updates her own en-route path choice policy while interacting with others on transportation networks. This paradigm is shown to generalize the classical notion of dynamic user equilibrium (DUE) to model-free and data-driven scenarios. We also illustrate that the equilibrium outcomes computed from our developed MARL paradigm coincide with DUE and dynamic system optimal (DSO), respectively, when rewards are set differently. In addition, with the goal to optimize some systematic objective (e.g., overall traffic condition) of city planners, we formulate a bilevel optimization problem with the upper level as city planners and the lower level as a multi-agent system where each rational and selfish traveler aims to minimize her travel cost. We demonstrate the effect of two administrative measures, namely tolling and signal control, on the behavior of travelers and show that the systematic objective of city planners can be optimized by a proper control. The results show that on the Braess network, the optimal toll charge on the central link is greater or equal to 25, with which the average travel time of selfish agents is minimized and the emergence of Braess paradox could be avoided. In a large-sized real-world road network with 69 nodes and 166 links, the optimal offset for signal control on Broadway is derived as 4 seconds, with which the average travel time of all controllable agents is minimized.

[1]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[2]  Athanasios K. Ziliaskopoulos,et al.  A Linear Programming Model for the Single Destination System Optimum Dynamic Traffic Assignment Problem , 2000, Transp. Sci..

[3]  Henry X. Liu,et al.  Continuous-time point-queue models in dynamic network loading , 2012 .

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Deepak K. Merchant,et al.  A Model and an Algorithm for the Dynamic Traffic Assignment Problems , 1978 .

[6]  Ana L. C. Bazzan,et al.  Re-routing Agents in an Abstract Traffic Scenario , 2008, SBIA.

[7]  Praveen Palanisamy,et al.  Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[8]  Xuegang Ban,et al.  A Cell-Based Many-to-One Dynamic System Optimal Model and Its Heuristic Solution Method for Emergency Evacuation , 2007 .

[9]  Yang Yu,et al.  Day-to-day dynamic traffic assignment with imperfect information, bounded rationality and information sharing , 2018, Transportation Research Part C: Emerging Technologies.

[10]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[11]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[12]  Henry X. Liu,et al.  Boundedly rational route choice behavior: A review of models and methodologies , 2016 .

[13]  KrauseAndreas,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012 .

[14]  Henry X. Liu,et al.  Boundedly Rational User Equilibria (BRUE): Mathematical Formulation and Solution Sets , 2013 .

[15]  Jun Wang,et al.  Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.

[16]  Chelsea C. White,et al.  Optimal vehicle routing with real-time traffic information , 2005, IEEE Transactions on Intelligent Transportation Systems.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[19]  Chao Mao,et al.  A reinforcement learning framework for the adaptive routing problem in stochastic time-dependent network , 2018 .

[20]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Haijun Huang,et al.  A cumulative prospect theory approach to commuters’ day-to-day route-choice modeling with friends’ travel information , 2018 .

[23]  Mark Crowley,et al.  Deep Multi Agent Reinforcement Learning for Autonomous Driving , 2020, Canadian AI.

[24]  Pedro A. Neto,et al.  Dynamic user equilibrium based on a hydrodynamic model , 2013 .

[25]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[26]  W. Y. Szeto,et al.  Stochastic cell transmission model (SCTM): A stochastic dynamic traffic model for traffic state surveillance and assignment , 2011 .

[27]  Ana L. C. Bazzan,et al.  Individual versus Difference Rewards on Reinforcement Learning for Route Choice , 2014, 2014 Brazilian Conference on Intelligent Systems.

[28]  Jufen Yang,et al.  Development of an enhanced route choice model based on cumulative prospect theory , 2014 .

[29]  Terry L. Friesz,et al.  The mathematical foundations of dynamic user equilibrium , 2019, Transportation Research Part B: Methodological.

[30]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[31]  George L. Nemhauser,et al.  Optimality Conditions for a Dynamic Traffic Assignment Model , 1978 .

[32]  Zhenjiang Zhao,et al.  A reinforcement learning scheme for the equilibrium of the in-vehicle route choice problem based on congestion game , 2020, Appl. Math. Comput..

[33]  Hua Zhang,et al.  Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning , 2019 .

[34]  Carlos F. Daganzo,et al.  THE CELL TRANSMISSION MODEL, PART II: NETWORK TRAFFIC , 1995 .

[35]  Xuan Di,et al.  Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning , 2020, ArXiv.

[36]  Ana L. C. Bazzan,et al.  A multiagent reinforcement learning approach to en-route trip building , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[37]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[38]  C. Gawron,et al.  An Iterative Algorithm to Determine the Dynamic User Equilibrium in a Traffic Simulation Model , 1998 .

[39]  Ana L. C. Bazzan,et al.  Using Topological Statistics to Bias and Accelerate Route Choice: Preliminary Findings in Synthetic and Real-World Road Networks , 2016, ATT@IJCAI.

[40]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[41]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[42]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[43]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[44]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[45]  Eric I. Pas,et al.  Braess' paradox: Some new insights , 1997 .

[46]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[47]  Yafeng Yin,et al.  A prospect-based user equilibrium model with endogenous reference points and its application in congestion pricing , 2011 .

[48]  Ivana Dusparic,et al.  Multi-agent Deep Reinforcement Learning for Zero Energy Communities , 2018, 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe).

[49]  Michiel C. J. Bliemer,et al.  Genetics of traffic assignment models for strategic transport planning , 2015 .

[50]  Richard Steinberg,et al.  PREVALENCE OF BRAESS' PARADOX , 1983 .

[51]  H. M. Zhang,et al.  A Comparative Study of Some Macroscopic Link Models Used in Dynamic Traffic Assignment , 2005 .

[52]  Masao Kuwahara,et al.  Decomposition of the reactive dynamic assignments with queues for a many-to-many origin-destination pattern , 1997 .

[53]  S. Hoogendoorn,et al.  Expected Utility Theory, Prospect Theory, and Regret Theory Compared for Prediction of Route Choice Behavior , 2011 .

[54]  Ana L. C. Bazzan,et al.  Analysing the impact of travel information for minimising the regret of route choice , 2018 .

[55]  Athanasios K. Ziliaskopoulos,et al.  Foundations of Dynamic Traffic Assignment: The Past, the Present and the Future , 2001 .

[56]  Moshe Ben-Akiva,et al.  Adaptive route choices in risky traffic networks: A prospect theory approach , 2010 .

[57]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[58]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[59]  Hani S. Mahmassani,et al.  An evaluation tool for advanced traffic information and management systems in urban networks , 1994 .

[60]  Srinivas Peeta,et al.  ROBUSTNESS OF THE OFF-LINE A PRIORI STOCHASTIC DYNAMIC TRAFFIC ASSIGNMENT SOLUTION FOR ON-LINE OPERATIONS , 1999 .

[61]  Yew-Soon Ong,et al.  Solving the Dynamic Vehicle Routing Problem Under Traffic Congestion , 2016, IEEE Transactions on Intelligent Transportation Systems.

[62]  Dilek Z. Hakkani-Tür,et al.  Federated Control with Hierarchical Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[63]  Henry X. Liu,et al.  SMART-Signal Phase II: Arterial Offset Optimization Using Archived High-Resolution Traffic Signal Data , 2013 .

[64]  Terry L. Friesz,et al.  Dynamic Network Traffic Assignment Considered as a Continuous Time Optimal Control Problem , 1989, Oper. Res..

[65]  Jong-Shi Pang,et al.  Modeling and solving continuous-time instantaneous dynamic user equilibria: A differential complementarity systems approach , 2012 .

[66]  Athanasios K. Ziliaskopoulos,et al.  Stochastic Dynamic Network Design Problem , 2001 .

[67]  Ming Yang,et al.  Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2020, AAAI.

[68]  C. Daganzo THE CELL TRANSMISSION MODEL.. , 1994 .