Application of a Deep Deterministic Policy Gradient Algorithm for Energy-Aimed Timetable Rescheduling Problem

Reinforcement learning has potential in the area of intelligent transportation due to its generality and real-time feature. The Q-learning algorithm, which is an early proposed algorithm, has its own merits to solve the train timetable rescheduling (TTR) problem. However, it has shortage in two aspects: Dimensional limits of action and a slow convergence rate. In this paper, a deep deterministic policy gradient (DDPG) algorithm is applied to solve the energy-aimed train timetable rescheduling (ETTR) problem. This algorithm belongs to reinforcement learning, which fulfills real-time requirements of the ETTR problem, and has adaptability on random disturbances. Superior to the Q-learning, DDPG has a continuous state space and action space. After enough training, the learning agent based on DDPG takes proper action by adjusting the cruising speed and the dwelling time continuously for each train in a metro network when random disturbances happen. Although training needs an iteration for thousands of episodes, the policy decision during each testing episode takes a very short time. Models for the metro network, based on a real case of the Shanghai Metro Line 1, are established as a training and testing environment. To validate the energy-saving effect and the real-time feature of the proposed algorithm, four experiments are designed and conducted. Compared with the no action strategy, results show that the proposed algorithm has real-time performance, and saves a significant percentage of energy under random disturbances.

[1]  Marcin Steczek,et al.  On-Board Energy Storage Devices with Supercapacitors for Metro Trains—Case Study Analysis of Application Effectiveness , 2019, Energies.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Dewang Chen,et al.  Online adjusting subway timetable by q-learning to save energy consumption in uncertain passenger demand , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[4]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[5]  Lorenzo Livi,et al.  Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors , 2018, Cognitive Computation.

[6]  Animesh Dutta,et al.  Real-time Rescheduling in Distributed Railway Network: An Agent-Based Approach , 2016, ArXiv.

[7]  Peng Zhou,et al.  The key principles of optimal train control—Part 1: Formulation of the model, strategies of optimal type, evolutionary lines, location of optimal switching points , 2016 .

[8]  Yousef Maknoon,et al.  The multi-objective railway timetable rescheduling problem , 2017 .

[9]  Maria Carmen Falvo,et al.  Energy Efficiency and Integration of Urban Electrical Transport Systems: EVs and Metro-Trains of Two Real European Lines , 2019, Energies.

[10]  Li Wang,et al.  Optimization Based High-Speed Railway Train Rescheduling with Speed Restriction , 2014 .

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Nils J. Nilsson,et al.  Artificial Intelligence: A New Synthesis , 1997 .

[13]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  Hagen Soltau,et al.  Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.

[16]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[17]  Francesco Corman,et al.  A Timetable Rescheduling Approach and Transition Phases for High-Speed Railway Traffic during Disruptions , 2017 .

[18]  Marijan Žura,et al.  Reinforcement learning approach for train rescheduling on a single-track railway , 2016 .

[19]  Masafumi Miyatake,et al.  Optimization of Train Speed Profile for Minimum Energy Consumption , 2010 .

[20]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[21]  Justo Puerto,et al.  On-Line Timetable Rescheduling in a Transit Line , 2018, Transp. Sci..

[22]  Nathan S. Netanyahu,et al.  DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.

[23]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[24]  João Sepúlveda,et al.  A new approach for real time train energy efficiency optimization , 2018 .

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[28]  Xiang Li,et al.  Train Rescheduling With Stochastic Recovery Time: A New Track-Backup Approach , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[29]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[30]  Jianguo Jiang,et al.  An Integrated Energy-Efficient Operation Methodology for Metro Systems Based on a Real Case of Shanghai Metro Line One , 2014 .

[31]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[32]  E. Taşdemiroğlu Incentives for solar water heating systems , 1985 .

[33]  Noe Casas,et al.  Deep Deterministic Policy Gradient for Urban Traffic Light Control , 2017, ArXiv.

[34]  Elena Agenjos,et al.  Energy efficiency in railways: Energy storage and electric generation in diesel electric locomotives , 2009 .