Predicting Near-Term Train Schedule Performance and Delay Using Bi-Level Random Forests

Accurate near-term passenger train delay prediction is critical for optimal railway management and providing passengers with accurate train arrival times. In this work, a novel bi-level random forest approach is proposed to predict passenger train delays in the Netherlands. The primary level predicts whether a train delay will increase, decrease, or remain unchanged in a specified time frame. The secondary level then estimates the actual delay (in minutes), given the predicted delay category at primary level. For validation purposes, the proposed model has been compared with several alternative statistical and machine-learning approaches. The results show that the proposed model provides the best prediction accuracy compared with other alternatives. Moreover, constructing the proposed bi-level model is computationally cheap, thereby being easily applicable.

[1]  Davide Anguita,et al.  Train Delay Prediction Systems: A Big Data Analytics Perspective , 2017, Big Data Res..

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Masoud Yaghini,et al.  Railway Passenger Train Delay Prediction via Neural Network Model , 2013 .

[4]  Wei-Hsun Lee,et al.  A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services , 2016 .

[5]  Pavle Kecman,et al.  Online Data-Driven Adaptive Prediction of Train Event Times , 2015, IEEE Transactions on Intelligent Transportation Systems.

[6]  Nebojsa J. Bojovic,et al.  Optimal allocation of buffer times to increase train schedule robustness , 2017, Eur. J. Oper. Res..

[7]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[8]  Marc Nunkesser,et al.  Mining Railway Delay Dependencies in Large-Scale Real-World Delay Data , 2009, Robust and Online Large-Scale Optimization.

[9]  Pavle Kecman,et al.  Predictive modelling of running and dwell times in railway traffic , 2015, Public Transp..

[10]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[11]  Johanna Törnquist Krasemann,et al.  Quantifying railway timetable robustness in critical points , 2013, J. Rail Transp. Plan. Manag..

[12]  Paul Schonfeld,et al.  Analyzing passenger train arrival delays with support vector regression , 2015 .

[13]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[16]  Nils O.E. Olsson,et al.  Influencing factors on train punctuality—results from some Norwegian studies , 2004 .

[17]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[19]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[20]  Giovanni Longo,et al.  A method for using stochastic blocking times to improve timetable planning , 2011, J. Rail Transp. Plan. Manag..

[21]  C. Y. Peng,et al.  An Introduction to Logistic Regression Analysis and Reporting , 2002 .