Continuous residual reinforcement learning for traffic signal control optimization

Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they have been successful in traffic signal control, they may become unstable and fail to converge to near-optimal solutions. We develop adaptive traffic signal controllers based on continuous residual reinforcement learning (CRL-TSC) that is more stable. The effect of three feature functions is empirically investigated in a microscopic traffic simulation. Furthermore, the effects of departing streets, more actions, and the use of the spatial distribution of the vehicles on the performance of CRL-TSCs are assessed. The results show that the best setup of the CRL-TSC leads to saving average travel time by 15% in comparison to an optimized fixed-time controller.

[1]  Monireh Abdoos,et al.  Hierarchical control of traffic signals using Q-learning with tile coding , 2013, Applied Intelligence.

[2]  B. Slack,et al.  The Geography of Transport Systems , 2006 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  R D Bretherton,et al.  SCOOT-a Traffic Responsive Method of Coordinating Signals , 1981 .

[5]  P G Boulter,et al.  Emission factors 2009: Report 3 - exhaust emission factors for road vehicles in the United Kingdom , 2009 .

[6]  Equipment Corp,et al.  The Sydney Coordinated Adaptive Traffic (SCAT) System Philosophy and Benefits , 1980 .

[7]  Marco Wiering,et al.  Developing adaptive traffic signal control by actor–critic and direct exploration methods , 2019, Proceedings of the Institution of Civil Engineers - Transport.

[8]  P. G. Gipps,et al.  A behavioural car-following model for computer simulation , 1981 .

[9]  Shalabh Bhatnagar,et al.  Reinforcement Learning With Function Approximation for Traffic Signal Control , 2011, IEEE Transactions on Intelligent Transportation Systems.

[10]  Nathan H. Gartner,et al.  OPAC: A DEMAND-RESPONSIVE STRATEGY FOR TRAFFIC SIGNAL CONTROL , 1983 .

[11]  Zhe Sun,et al.  Type-2 fuzzy multi-intersection traffic signal control with differential evolution optimization , 2014, Expert Syst. Appl..

[12]  Mashrur Chowdhury,et al.  Fundamentals of Intelligent Transportation Systems Planning , 2003 .

[13]  Josep Perarnau,et al.  Traffic Simulation with Aimsun , 2010 .

[14]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[15]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Jean-Loup Farges,et al.  THE PRODYN REAL TIME TRAFFIC ALGORITHM , 1983 .

[18]  B. Bhatta Analysis of Urban Growth and Sprawl from Remote Sensing Data , 2010 .

[19]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[20]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Marco Wiering,et al.  Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events , 2017 .

[22]  Zissis Samaras,et al.  METHODOLOGY FOR CALCULATING TRANSPORT EMISSIONS AND ENERGY CONSUMPTION , 1999 .

[23]  Sandhya Samarasinghe,et al.  Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition , 2006 .

[24]  Rahim F Benekohal,et al.  Q-learning and Approximate Dynamic Programming for Traffic Control: A Case Study for an Oversaturated Network , 2012 .

[25]  Ebrahim Mamdani,et al.  Applications of fuzzy algorithms for control of a simple dynamic plant , 1974 .

[26]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[27]  Monireh Abdoos,et al.  Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[28]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[29]  Marco Wiering,et al.  Reinforcement Learning and Markov Decision Processes , 2012, Reinforcement Learning.

[30]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[31]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .

[32]  Lucas Barcelos de Oliveira,et al.  Multi-agent Model Predictive Control of Signaling Split in Urban Traffic Networks ∗ , 2010 .

[33]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[34]  P. G. Gipps,et al.  A MODEL FOR THE STRUCTURE OF LANE-CHANGING DECISIONS , 1986 .

[35]  Dipti Srinivasan,et al.  Neural Networks for Real-Time Traffic Signal Control , 2006, IEEE Transactions on Intelligent Transportation Systems.

[36]  Ali Hajbabaie,et al.  Arterial traffic control using reinforcement learning agents and information from adjacent intersections in the state and reward structure , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[37]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[38]  Baher Abdulhai,et al.  Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto , 2013, IEEE Transactions on Intelligent Transportation Systems.

[39]  Yu-Chiun Chiou,et al.  Stepwise genetic fuzzy logic signal control under mixed traffic conditions , 2013 .

[40]  Monireh Abdoos,et al.  Holonic multi-agent system for traffic signals control , 2013, Eng. Appl. Artif. Intell..

[41]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .