Deep Deterministic Policy Gradient for Urban Traffic Light Control

Traffic light timing optimization is still an active line of research despite the wealth of scientific literature on the topic, and the problem remains unsolved for any non-toy scenario. One of the key issues with traffic light optimization is the large scale of the input information that is available for the controlling agent, namely all the traffic data that is continually sampled by the traffic detectors that cover the urban network. This issue has in the past forced researchers to focus on agents that work on localized parts of the traffic network, typically on individual intersections, and to coordinate every individual agent in a multi-agent setup. In order to overcome the large scale of the available state information, we propose to rely on the ability of deep Learning approaches to handle large input spaces, in the form of Deep Deterministic Policy Gradient (DDPG) algorithm. We performed several experiments with a range of models, from the very simple one (one intersection) to the more complex one (a big city section).

[1]  Enrique Alba,et al.  Optimal Cycle Program of Traffic Lights With Particle Swarm Optimization , 2013, IEEE Transactions on Evolutionary Computation.

[2]  Saiedeh N. Razavi,et al.  Using a Deep Reinforcement Learning Agent for Traffic Signal Control , 2016, ArXiv.

[3]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[4]  Nagui M Rouphail,et al.  Direct Signal Timing Optimization: Strategy Development and Results , 2000 .

[5]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[6]  Thomas L. Thorpe Vehicle Traffic Light Control Using SARSA , 1997 .

[7]  Suvrajeet Sen,et al.  Controlled Optimization of Phases at an Intersection , 1997, Transp. Sci..

[8]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[14]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Peter Holm,et al.  Traffic Analysis Toolbox Volume IV: Guidelines for Applying CORSIM Microsimulation Modeling Software , 2007 .

[17]  Chun Shao Adaptive control strategy for isolated intersection and traffic network , 2009 .

[18]  Daniel Krajzewicz,et al.  Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[19]  Jordan B. Pollack,et al.  Why did TD-Gammon Work? , 1996, NIPS.

[20]  Nathan H. Gartner,et al.  OPAC: A DEMAND-RESPONSIVE STRATEGY FOR TRAFFIC SIGNAL CONTROL , 1983 .

[21]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[22]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[23]  Eduardo Camponogara,et al.  Distributed Learning Agents in Urban Traffic Control , 2003, EPIA.

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[27]  Zhao-Sheng Yang,et al.  Study on Urban Traffic Management Based on Multi-Agent System , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[28]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[29]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[30]  S. Chand,et al.  Adaptive traffic signal control using fuzzy logic , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[31]  Jordan B. Pollack,et al.  Robot coverage control by evolved neuromodulation , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[32]  Hongchi Shi,et al.  Adaptive Traffic Light Control with Wireless Sensor Networks , 2007, 2007 4th IEEE Consumer Communications and Networking Conference.

[33]  John D. C. Little,et al.  The Synchronization of Traffic Signals by Mixed-Integer Linear Programming , 2011, Oper. Res..

[34]  Yiheng Feng,et al.  A real-time adaptive signal control in a connected vehicle environment , 2015 .

[35]  R D Bretherton,et al.  THE SCOOT ON-LINE TRAFFIC SIGNAL OPTIMISATION TECHNIQUE , 1982 .

[36]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[37]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[38]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[39]  A. Koopman,et al.  Simulation and optimization of traffic in a city , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[40]  D I Robertson,et al.  "TRANSYT" METHOD FOR AREA TRAFFIC CONTROL , 1969 .

[41]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[42]  Josep Perarnau,et al.  Traffic Simulation with Aimsun , 2010 .

[43]  Feng‐Bor Lin Use of Binary Choice Decision Process for Adaptive Signal Control , 1989 .

[44]  Baher Abdulhai,et al.  Design of Reinforcement Learning Parameters for Seamless Application of Adaptive Traffic Signal Control , 2014, J. Intell. Transp. Syst..

[45]  Kai Wang,et al.  Agent-based traffic simulation and traffic signal timing optimization with GPU , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[46]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[47]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[48]  M. Papageorgiou,et al.  OPTIMAL SIGNAL CONTROL OF URBAN TRAFFIC NETWORKS , 1992 .

[49]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[50]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[51]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[52]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[54]  Frans A. Oliehoek,et al.  Video Demo: Deep Reinforcement Learning for Coordination in Traffic Light Control , 2016 .

[55]  Gordon D. B. Cameron,et al.  PARAMICS—Parallel microscopic simulation of road traffic , 1996, The Journal of Supercomputing.

[56]  Fernando Gomide,et al.  Fuzzy traffic control: adaptive strategies , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.