Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control Problem

The adaptive traffic signal control (ATSC) problem can be modeled as a multiagent cooperative game among urban intersections, where intersections cooperate to counter the city's traffic conditions. Recently, reinforcement learning (RL) has achieved marked successes in managing sequential decision making problems, which motivates us to apply RL in the ATSC problem. One of the largest challenges of this problem is that the observation of intersection is typically partially observable, which limits the learning performance of RL algorithms. Considering the large scale of intersections in an urban traffic environment, we use independent RL to solve ATSC problem in this study. We model ATSC problem as a partially observable weak cooperative traffic model (PO-WCTM). Different from a traditional IRL task that averages the returns of all agents in fully cooperative games, the learning goal of each intersection in PO-WCTM is to reduce the cooperative difficulty of learning, which is also consistent with the traffic environment hypothesis. To achieve the optimal cooperative strategy of PO-WCTM, we propose an IRL algorithm called Cooperative Important Lenient Double DQN (CIL-DDQN), which extends Double DQN (DDQN) algorithm using two mechanisms: the forgetful experience mechanism and the lenient weight training mechanism. The former mechanism decreases the importance of experiences stored in the experience reply buffers, while the latter mechanism increases the weight experiences with high estimation and ‘leniently’ trains the DDQN neural network. Experiments in two real traffic scenarios and one simulated traffic scenarios show that, CIL-DDQN outperforms other methods in almost all performance indicators of ATSC.

[1]  Zihan Zhou,et al.  CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario , 2019, WWW.

[2]  R D Bretherton,et al.  THE SCOOT ON-LINE TRAFFIC SIGNAL OPTIMISATION TECHNIQUE , 1982 .

[3]  Tianshu Chu,et al.  Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[4]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[5]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[6]  Nan Xu,et al.  CoLight: Learning Network-level Cooperation for Traffic Signal Control , 2019, CIKM.

[7]  Satish V. Ukkusuri,et al.  A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework , 2015 .

[8]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[9]  Sandeep Chinchali,et al.  Multi-agent Reinforcement Learning for Networked System Control , 2020, ICLR.

[10]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[11]  Takayoshi Yoshimura,et al.  Traffic Signal Control Based on Reinforcement Learning with Graph Convolutional Neural Nets , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[12]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[13]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[14]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[15]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[16]  Seung-Hyun Kong,et al.  Deep Q Learning with LSTM for Traffic Light Control , 2018, 2018 24th Asia-Pacific Conference on Communications (APCC).

[17]  Nan Xu,et al.  Diagnosing Reinforcement Learning for Traffic Signal Control , 2019, ArXiv.

[18]  Jianye Hao,et al.  Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems , 2019, AAMAS.

[19]  Arne Koopman,et al.  Intelligent Traffic Light Control , 2004 .

[20]  LukeSean,et al.  Lenient learning in independent-learner stochastic cooperative games , 2016 .

[21]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[22]  Nathan H. Gartner,et al.  MULTIBAND--A VARIABLE-BANDWIDTH ARTERIAL PROGRESSION SCHEME , 1990 .

[23]  Carlos Gershenson,et al.  Self-organizing traffic lights: A realistic simulation , 2006, Advances in Applied Self-organizing Systems.

[24]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[25]  J. Y. K. Luk,et al.  Two traffic responsive area traffic control methods: SCAT and SCOOT , 1983 .

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Zhu Han,et al.  A Deep Reinforcement Learning Network for Traffic Light Cycle Control , 2018, IEEE Transactions on Vehicular Technology.

[28]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[29]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[30]  Rahul Savani,et al.  Negative Update Intervals in Deep Multi-Agent Reinforcement Learning , 2018, AAMAS.