Decentralized learning for traffic signal control

In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller). Each agent optimizes the order of the phase sequence using Q-learning with either ∈-greedy or UCB [3] based exploration strategies. The coordination between the junctions is achieved based on the cost feedback signal received from the neighbouring junctions. The learning algorithm for each agent updates the Q-factors using this feedback signal. We show through simulations over VISSIM that our algorithms perform significantly better than the standard fixed signal timing (FST), the saturation balancing (SAT) [14] and the round-robin multi-agent reinforcement learning algorithms [11] over two real road networks.

[1]  Shalabh Bhatnagar,et al.  Threshold Tuning Using Stochastic Optimization for Graded Signal Control , 2012, IEEE Transactions on Vehicular Technology.

[2]  Shalabh Bhatnagar,et al.  Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[3]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[4]  Vinny Cahill,et al.  A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[5]  Silvia Richter,et al.  Learning Road Traffic Control: Towards Practical Traffic Control Using Policy Gradients , 2006 .

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Ana L. C. Bazzan,et al.  Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[8]  Discrete simultaneous perturbation stochastic approximation on loss function with noisy measurements , 2011, Proceedings of the 2011 American Control Conference.

[9]  Chen Cai,et al.  Adaptive traffic signal control using approximate dynamic programming , 2009 .

[10]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[11]  Wang,et al.  Review of road traffic control strategies , 2003, Proceedings of the IEEE.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Baher Abdulhai,et al.  An agent-based learning towards decentralized and coordinated traffic signal control , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[14]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  J Y Luk,et al.  TRANSYT: traffic network study tool , 1990 .

[17]  Juan C. Medina,et al.  Traffic signal control using reinforcement learning and the max-plus algorithm as a coordinating strategy , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[18]  Monireh Abdoos,et al.  Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[19]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.