Reinforcement learning approach for optimal control of multiple electric locomotives in a heavy-haul freight train: A Double-Switch-Q-network architecture

Abstract Electric locomotives provide high tractive power for fast acceleration of heavy-haul freight trains, and significantly reduce the energy consumption with regenerative braking. This paper proposes a reinforcement learning (RL) approach for the optimal control of multiple electric locomotives in a heavy-haul freight train, without using the prior knowledge of train dynamics and the pre-designed velocity profile. The optimization takes the velocity, energy consumption and coupler force as objectives, considering the constraints on locomotive notches and their change rates, speed restrictions, traction and regenerative braking. Besides, since the problem in this paper has continuous state space and large action space, and the adjacent actions’ influences on states share similarities, we propose a Double-Switch Q-network (DSQ-network) architecture to achieve fast approximation of the action-value function, which enhances the parameter sharing of states and actions, and denoises the action-value function. In the numerical experiments, we test DSQ-network in 28 cases using the data of China Railways HXD3B electric locomotive. The results indicate that compared with table-lookup Q-learning, DSQ-network converges much faster and uses less storage space in the optimal control of electric locomotives. Besides, we analyze 1)the influences of ramps and speed restrictions on the optimal policy, and 2)the inter-dependent and inter-conditioned relationships between multiple optimization objectives. Finally, the factors that influence the convergence rate and solution accuracy of DSQ-network are discussed based on the visualization of the high-dimensional value functions.

[1]  Eugene Khmelnitsky,et al.  On an optimal control problem of train operation , 2000, IEEE Trans. Autom. Control..

[2]  Xiaohua Xia,et al.  Optimal cruise control of heavy-haul trains equipped with electronically controlled pneumatic brake systems , 2007 .

[3]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[4]  Xiangtao Zhuan Fault-tolerant control of heavy-haul trains , 2010 .

[5]  Masafumi Miyatake,et al.  Application of dynamic programming to the optimization of the running profile of a train , 2004 .

[6]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[7]  Xiaoyun Feng,et al.  Robust Stochastic Control for High-Speed Trains With Nonlinearity, Parametric Uncertainty, and Multiple Time-Varying Delays , 2018, IEEE Transactions on Intelligent Transportation Systems.

[8]  Marijan Žura,et al.  Reinforcement learning approach for train rescheduling on a single-track railway , 2016 .

[9]  Xiaohua Xia,et al.  OPTIMAL CRUISE CONTROL OF HEAVY-HAUL TRAIN EQUIPPED WITH ELECTRONIC CONTROLLED PNEUMATIC BRAKE SYSTEMS , 2007 .

[10]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[11]  Lingling Fan Iterative learning and adaptive fault-tolerant control with application to high-speed trains under unknown speed delays and control input saturations , 2014 .

[12]  Tao Tang,et al.  Intelligent operation of heavy haul train with data imbalance: A machine learning method , 2019, Knowl. Based Syst..

[13]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[14]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Hairong Dong,et al.  Approximation-Based Robust Adaptive Automatic Train Control: An Approach for Actuator Saturation , 2013, IEEE Transactions on Intelligent Transportation Systems.

[17]  Peter Stone,et al.  Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Jie Lu,et al.  Bayesian Deep Reinforcement Learning via Deep Kernel Learning , 2018, Int. J. Comput. Intell. Syst..

[20]  Tin Kin Ho,et al.  Coast control of train movement with genetic algorithm , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Xiangtao Zhuan,et al.  Development of an Optimal Operation Approach in the MPC Framework for Heavy-Haul Trains , 2015, IEEE Transactions on Intelligent Transportation Systems.

[23]  Zhongsheng Hou,et al.  Adaptive Iterative Learning Control for High-Speed Trains With Unknown Speed Delays and Input Saturations , 2016, IEEE Transactions on Automation Science and Engineering.

[24]  Leo G. Kroon,et al.  Review of energy-efficient train control and timetabling , 2017, Eur. J. Oper. Res..

[25]  Xiaohua Xia,et al.  Modelling and model validation of heavy-haul trains equipped with electronically controlled pneumatic brake systems , 2007 .

[26]  Dewang Chen,et al.  Intelligent Train Operation Algorithms for Subway by Expert System and Reinforcement Learning , 2014, IEEE Transactions on Intelligent Transportation Systems.

[27]  Ziyou Gao,et al.  Robust sampled-data cruise control scheduling of high speed train , 2014 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Phil Howlett,et al.  Optimal strategies for the control of a train , 1996, Autom..

[30]  Shangtai Jin,et al.  Iterative learning control based automatic train operation with iteration-varying parameter , 2013, 2013 10th IEEE International Conference on Control and Automation (ICCA).

[31]  B. Schutter,et al.  Optimal trajectory planning for trains – A pseudospectral method and a mixed integer linear programming approach , 2013 .

[32]  Rob M.P. Goverde,et al.  Multiple-phase train trajectory optimization with signalling and operational constraints , 2016 .

[33]  Xiangtao Zhuan,et al.  Optimal Operation of Heavy-Haul Trains Equipped With Electronically Controlled Pneumatic Brake Systems Using Model Predictive Control Methodology , 2014, IEEE Transactions on Control Systems Technology.

[34]  Xiaohua Xia,et al.  Optimal Scheduling and Control of Heavy Haul Trains Equipped With Electronically Controlled Pneumatic Braking Systems , 2006, IEEE Transactions on Control Systems Technology.

[35]  P. Gruber,et al.  Suboptimal control strategies for multilocomotive powered trains , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[36]  Qi Song,et al.  Data-Based Fault-Tolerant Control of High-Speed Trains With Traction/Braking Notch Nonlinearities and Actuator Failures , 2011, IEEE Transactions on Neural Networks.

[37]  Phil Howlett,et al.  The key principles of optimal train control—Part 2: Existence of an optimal strategy, the local energy minimization principle, uniqueness, computational techniques , 2016 .

[38]  Peng Zhou,et al.  The key principles of optimal train control—Part 1: Formulation of the model, strategies of optimal type, evolutionary lines, location of optimal switching points , 2016 .

[39]  Mehmet Turan Soylemez,et al.  Coasting point optimisation for mass rail transit lines using artificial neural networks and genetic algorithms , 2008 .

[40]  Tao Tang,et al.  Optimal control of heavy haul train based on approximate dynamic programming , 2017 .

[41]  Jianbin Qiu,et al.  Adaptive Fuzzy Control for Nontriangular Structural Stochastic Switched Nonlinear Systems With Full State Constraints , 2019, IEEE Transactions on Fuzzy Systems.

[42]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[43]  Lance Sherry,et al.  Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures , 2010 .

[44]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[45]  Clive Roberts,et al.  Single-Train Trajectory Optimization , 2013, IEEE Transactions on Intelligent Transportation Systems.

[46]  Ziyou Gao,et al.  Research and development of automatic train operation for railway transportation systems: A survey , 2017 .

[47]  Pengfei Sun,et al.  The Energy-Efficient Operation Problem of a Freight Train Considering Long-Distance Steep Downhill Sections , 2017 .

[48]  Xiaojie You,et al.  Power electronics used in Chinese electric locomotives , 2009, 2009 IEEE 6th International Power Electronics and Motion Control Conference.

[49]  Jianbin Qiu,et al.  Observer-Based Fuzzy Adaptive Event-Triggered Control for Pure-Feedback Nonlinear Systems With Prescribed Performance , 2019, IEEE Transactions on Fuzzy Systems.

[50]  Xiaohua Xia,et al.  Cruise control scheduling of heavy haul trains , 2006, IEEE Transactions on Control Systems Technology.

[51]  Piotr Lukaszewicz,et al.  Modeling and optimizing energy‐efficient manual driving on high‐speed lines , 2012 .

[52]  Satish V. Ukkusuri,et al.  Accounting for dynamic speed limit control in a stochastic traffic environment: a reinforcement learning approach , 2014 .

[53]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[54]  Dewang Chen,et al.  Smart train operation algorithms based on expert knowledge and ensemble CART for the electric locomotive , 2016, Knowl. Based Syst..

[55]  V K Garg,et al.  Dynamics of railway vehicle systems , 1984 .

[56]  Ziyou Gao,et al.  Optimal Guaranteed Cost Cruise Control for High-Speed Train Movement , 2016, IEEE Transactions on Intelligent Transportation Systems.

[57]  Xiaohua Xia,et al.  Speed regulation with measured output feedback in the control of heavy haul trains , 2008, Autom..

[58]  Hairong Dong,et al.  Neural adaptive control for uncertain MIMO systems with constrained input via intercepted adaptation and single learning parameter approach , 2015 .