Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework

Abstract Bus bunching has been a long-standing problem that undermines the efficiency and reliability of public transport services. The most popular countermeasure in practice is to introduce static and dynamic holding control. However, most previous holding control strategies mainly consider local information with a pre-specified headway/schedule, while the global coordination of the whole bus fleet and its long-term effect are often overlooked. To efficiently incorporate global coordination and long-term operation in bus holding, in this paper we propose a multi-agent deep reinforcement learning (MDRL) framework to develop dynamic and flexible holding control strategies for a bus route. Specifically, we model each bus as an agent that interacts with not only its leader/follower but also all other vehicles in the fleet. To better explore potential strategies, we develop an effective headway-based reward function in the proposed framework. In the learning framework, we model fleet coordination by using a basic actor-critic scheme along with a joint action tracker to better characterize the complex interactions among agents in policy learning, and we apply proximal policy optimization to improve learning performance. We conduct extensive numerical experiments to evaluate the proposed MDRL framework against multiple baseline models that only rely on local information. Our results demonstrate the superiority of the proposed framework and show the promise of applying MDRL in the coordinative control of public transport vehicle fleets in real-world operations.

[1]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[2]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[3]  Ronghui Liu,et al.  Modelling bus bunching and holding control with vehicle overtaking and distributed passenger boarding behaviour , 2017 .

[4]  R B Potts,et al.  Maintaining a bus schedule , 1964 .

[5]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[6]  Konstantinos Gkiotsalitis,et al.  Reinforcement Learning-Based Bus Holding for High-Frequency Services , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[7]  Qing Liu,et al.  Real-Time Optimization Model for Dynamic Scheduling of Transit Operations , 2003 .

[8]  Carlos F. Daganzo,et al.  Dynamic bus holding strategies for schedule reliability: Optimal linear control and performance analysis , 2011 .

[9]  Marco Wiering,et al.  Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events , 2017 .

[10]  Huasheng Liu,et al.  A Self-Adjusting Method to Resist Bus Bunching Based on Boarding Limits , 2016 .

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Mark D. Hickman,et al.  The Real–Time Stop–Skipping Problem , 2005, J. Intell. Transp. Syst..

[13]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14]  Laura Torres,et al.  Bus control strategies in corridors with signalized intersections , 2016 .

[15]  Chunxiao Chen,et al.  Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Haris N. Koutsopoulos,et al.  Impacts of Holding Control Strategies on Transit Performance , 2011 .

[18]  Leonard C. Feldman,et al.  Cooperation , 2017, Polity.

[19]  Ricardo Giesen,et al.  Real-Time Control of Buses in a Transit Corridor Based on Vehicle Holding and Boarding Limits , 2009 .

[20]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[21]  Carlos F. Daganzo,et al.  Reducing bunching with bus-to-bus cooperation , 2011 .

[22]  Ludovic Leclercq,et al.  Comparing bus holding methods with and without real-time predictions , 2017 .

[23]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[24]  Ricardo Giesen,et al.  How much can holding and/or limiting boarding improve transit performance? , 2012 .

[25]  Yanfeng Ouyang,et al.  Dynamic bus substitution strategy for bunching intervention , 2018, Transportation Research Part B: Methodological.

[26]  Donald D. Eisenstein,et al.  A self-coördinating bus route to resist bus bunching , 2012 .

[27]  Alexandre M. Bayen,et al.  Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control , 2017, ArXiv.

[28]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[29]  David Bernstein,et al.  The Holding Problem with Real - Time Information Available , 2001, Transp. Sci..

[30]  Satish T. S. Bukkapatnam,et al.  Optimal Slack Time for Schedule-Based Transit Operations , 2006, Transp. Sci..

[31]  Carlos F. Daganzo,et al.  A headway-based approach to eliminate bus bunching: Systematic analysis and comparisons , 2009 .