论文信息 - Decentralized learning for traffic signal control

Decentralized learning for traffic signal control

In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller). Each agent optimizes the order of the phase sequence using Q-learning with either ∈-greedy or UCB [3] based exploration strategies. The coordination between the junctions is achieved based on the cost feedback signal received from the neighbouring junctions. The learning algorithm for each agent updates the Q-factors using this feedback signal. We show through simulations over VISSIM that our algorithms perform significantly better than the standard fixed signal timing (FST), the saturation balancing (SAT) [14] and the round-robin multi-agent reinforcement learning algorithms [11] over two real road networks.

[1] Shalabh Bhatnagar,et al. Threshold Tuning Using Stochastic Optimization for Graded Signal Control , 2012, IEEE Transactions on Vehicular Technology.

[2] Shalabh Bhatnagar,et al. Multi-agent reinforcement learning for traffic signal control , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[3] Shimon Whiteson,et al. Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[4] Vinny Cahill,et al. A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[5] Silvia Richter,et al. Learning Road Traffic Control: Towards Practical Traffic Control Using Policy Gradients , 2006 .

[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[7] Ana L. C. Bazzan,et al. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[8] Discrete simultaneous perturbation stochastic approximation on loss function with noisy measurements , 2011, Proceedings of the 2011 American Control Conference.

[9] Chen Cai,et al. Adaptive traffic signal control using approximate dynamic programming , 2009 .

[10] Shimon Whiteson,et al. Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[11] Wang,et al. Review of road traffic control strategies , 2003, Proceedings of the IEEE.

[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13] Baher Abdulhai,et al. An agent-based learning towards decentralized and coordinated traffic signal control , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[14] T. Urbanik,et al. Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] J Y Luk,et al. TRANSYT: traffic network study tool , 1990 .

[17] Juan C. Medina,et al. Traffic signal control using reinforcement learning and the max-plus algorithm as a coordinating strategy , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[18] Monireh Abdoos,et al. Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[19] Shalabh Bhatnagar,et al. Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[20] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.