On the Data-Driven Prediction of Arrival Times for Freight Trains on U.S. Railroads

The high capacity utilization and the pre-dominantly single-track network topology of freight railroads in the United States causes large variability and unpredictability of train arrival times. Predicting accurate estimated times of arrival (ETAs) is an important step for railroads to increase efficiency and automation, reduce costs, and enhance customer service. We propose using machine learning algorithms trained on historical railroad operational data to generate ETAs in real time. The machine learning framework is able to utilize the many data points produced by individual trains traversing a network track segment and generate periodic ETA predictions with a single model. In this work we compare the predictive performance of linear and non-linear support vector regression, random forest regression, and deep neural network models, tested on a section of the railroad in Tennessee, USA using over two years of historical data. Support vector regression and deep neural network models show similar results with maximum ETA error reduction of 26% over a statistical baseline predictor. The random forest models show over 60% error reduction compared to baseline at some points and average error reduction of 42%.

[1]  Masoud Yaghini,et al.  Railway Passenger Train Delay Prediction via Neural Network Model , 2013 .

[2]  Paolo Toth,et al.  Scheduling extra freight trains on railway networks , 2010 .

[3]  Maged M. Dessouky,et al.  Modeling train movements through complex rail networks , 2004, TOMC.

[4]  Davide Anguita,et al.  Dynamic Delay Predictions for Large-Scale Railway Networks: Deep and Shallow Extreme Learning Machines Tuned via Thresholdout , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[5]  Ren Wang,et al.  Data Driven Approaches for Passenger Train Delay Estimation , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[6]  Daniel B. Work,et al.  Prediction of arrival times of freight traffic on US railroads using support vector regression , 2018, Transportation Research Part C: Emerging Technologies.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[9]  Rob M.P. Goverde,et al.  Recent applications of big data analytics in railway transportation systems: A survey , 2018 .

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  Tao Tang,et al.  Prediction algorithms for train arrival time in urban rail transit , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Paolo Toth,et al.  A Survey of Optimization Models for Train Routing and Scheduling , 1998, Transp. Sci..

[15]  Abhishek Dubey,et al.  DelayRadar: A multivariate predictive model for transit systems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[16]  Malachy Carey,et al.  Stochastic approximation to the effects of headways on knock-on delays of trains , 1994 .

[17]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[18]  Maged Dessouky,et al.  A delay estimation technique for single and double-track railroads , 2010 .

[19]  Anders Peterson,et al.  Improving train service reliability by applying an effective timetable robustness strategy , 2017, J. Intell. Transp. Syst..

[20]  Kunal Bonsra,et al.  Estimation of run times in a freight rail transportation network , 2012 .

[21]  Paul Schonfeld,et al.  Analyzing passenger train arrival delays with support vector regression , 2015 .

[22]  Christopher P. L. Barkan,et al.  Impact of Train Type Heterogeneity on Single-Track Railway Capacity , 2009 .

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[25]  Christopher P. L. Barkan,et al.  Comparison of Capacity of Single- and Double-Track Rail Lines , 2013 .

[26]  Baohua Mao,et al.  A bi-level model for single-line rail timetable design with consideration of demand and capacity , 2017 .

[27]  Paolo Toth,et al.  Chapter 3 Passenger Railway Optimization , 2007, Transportation.

[28]  Francesco Corman,et al.  Closing the loop in real-time railway control: Framework design and impacts on operations , 2015 .

[29]  Michael F. Gorman,et al.  Statistical estimation of railroad congestion delay , 2009 .

[30]  Dario Pacciarelli,et al.  A branch and bound algorithm for scheduling trains in a railway network , 2007, Eur. J. Oper. Res..

[31]  Pavle Kecman,et al.  Online Data-Driven Adaptive Prediction of Train Event Times , 2015, IEEE Transactions on Intelligent Transportation Systems.

[32]  Pavle Kecman,et al.  Predictive modelling of running and dwell times in railway traffic , 2015, Public Transp..

[33]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .