Automated Setting of Bus Schedule Coverage Using Unsupervised Machine Learning

The efficiency of Public Transportation (PT) Networks is a major goal of any urban area authority. Advances on both location and communication devices drastically increased the availability of the data generated by their operations. Adequate Machine Learning methods can thus be applied to identify patterns useful to improve the Schedule Plan. In this paper, the authors propose a fully automated learning framework to determine the best Schedule Coverage to be assigned to a given PT network based on Automatic Vehicle location (AVL) and Automatic Passenger Counting (APC) data. We formulate this problem as a clustering one, where the best number of clusters is selected through an ad-hoc metric. This metric takes into account multiple domain constraints, computed using Sequence Mining and Probabilistic Reasoning. A case study from a large operator in Sweden was selected to validate our methodology. Experimental results suggest necessary changes on the Schedule coverage. Moreover, an impact study was conducted through a large-scale simulation over the affected time period. Its results uncovered potential improvements of the schedule reliability on a large scale.

[1]  Avishai Ceder,et al.  Urban Transit Scheduling: Framework, Review and Examples , 2002 .

[2]  Adrian E. Raftery,et al.  Normal Mixture Modelling for Model-Based Clustering,Classification, and Density Estimation , 2015 .

[3]  Lars Schmidt-Thieme,et al.  Data Analysis and Decision Support (Studies in Classification, Data Analysis, and Knowledge Organization) , 2005 .

[4]  Graham Currie,et al.  Efficient Transit Schedule Design of timing points: A comparison of Ant Colony and Genetic Algorithms , 2012 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  João Gama,et al.  Improving Mass Transit Operations by Using AVL-Based Systems: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[7]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Reinhold Decker,et al.  The Number of Clusters in Market Segmentation , 2005, Data Analysis and Decision Support.

[10]  João Mendes-Moreira,et al.  Validation of both number and coverage of bus schedules using AVL data , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[11]  João Gama,et al.  Validating the coverage of bus schedules: A Machine Learning approach , 2015, Inf. Sci..

[12]  Haris N. Koutsopoulos,et al.  Optimizing the number and location of time point stops , 2014, Public Transp..

[13]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  Alípio Mário Jorge,et al.  Finding Interesting Contexts for Explaining Deviations in Bus Trip Duration Using Distribution Rules , 2012, IDA.