论文信息 - Reinforcement Learning Agent under Partial Observability for Traffic Light Control in Presence of Gridlocks

Reinforcement Learning Agent under Partial Observability for Traffic Light Control in Presence of Gridlocks

Bangkok is notorious for its chronic traffic congestion due to the rapid urbanization and the haphazard city plan. The Sathorn Road network area stands to be one of the most critical areas where gridlocks are a normal occurrence during rush hours. This stems from the high volume of demand imposed by the dense geographical placement of 3 big educational institutions and the insufficient link capacity with strict routes. Current solutions place heavy reliance on human traffic control expertises to prevent and disentangle gridlocks by consecutively releasing each queue length spillback through inter-junction coordination. A calibrated dataset of the Sathorn Road network area in a microscopic road traffic simulation package SUMO (Simulation of Urban MObility) is provided in the work of Chula-Sathorn SUMO Simulator (Chula-SSS). In this paper, we aim to utilize the Chula-SSS dataset with extended vehicle flows and gridlocks in order to further optimize the present traffic signal control policies with reinforcement learning approaches by an artificial agent. Reinforcement learning has been successful in a variety of domains over the past few years. While a number of researches exist on using reinforcement learning with adaptive traffic light control, existing studies often lack pragmatic considerations concerning application to the physical world especially for the traffic system infrastructure in developing countries, which suffer from constraints imposed from economic factors. The resultant limitation of the agent’s partial observability of the whole network state at any specific time is imperative and cannot be overlooked. With such partial observability constraints, this paper has reported an investigation on applying the Ape-X Deep Q-Network agent at the critical junction in the morning rush hours from 6 AM to 9 AM with practically occasional presence of gridlocks. The obtainable results have shown a potential value of the agent’s ability to learn despite physical limitations in the traffic light control at the considered intersection within the Sathorn gridlock area. This suggests a possibility of further investigations on agent applicability in trying to mitigate complex interconnected gridlocks in the future.

Chaodit Aswakul | Thanapapas Horsuwan | C. Aswakul | Thanapapas Horsuwan

[1] Minoru Ito,et al. Adaptive Traffic Signal Control: Deep Reinforcement Learning Algorithm with Experience Replay and Target Network , 2017, ArXiv.

[2] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[3] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[4] Saiedeh N. Razavi,et al. Using a Deep Reinforcement Learning Agent for Traffic Signal Control , 2016, ArXiv.

[5] Pascal Poupart,et al. On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[6] Zhenhui Li,et al. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control , 2018, KDD.

[7] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Yun-Pang Flötteröd,et al. Microscopic Traffic Simulation using SUMO , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[12] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[13] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[14] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[15] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[16] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17] Peter Corcoran,et al. Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning , 2017, ArXiv.

[18] Maxim Raya,et al. TraCI: an interface for coupling road traffic and network simulators , 2008, CNS '08.

[19] Chaodit Aswakul,et al. Chula-SSS: Developmental Framework for Signal Actuated Logics on SUMO Platform in Over-saturated Sathorn Road Network Scenario , 2018 .

[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[21] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[22] T. Urbanik,et al. Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[23] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[24] Baher Abdulhai,et al. Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[25] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.