Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro

The problem of traffic prediction is paramount in a plethora of applications, ranging from individual trip planning to urban planning. Existing work mainly focuses on traffic prediction on road networks. Yet, public transportation contributes a significant portion to overall human mobility and passenger volume. For example, the Washington, DC metro has on average 600,000 passengers on a weekday. In this work, we address the problem of modeling, classifying and predicting such passenger volume in public transportation systems. We study the case of the Washington, DC metro exploring fare card data, and specifically passenger in- and outflow at stations. To reduce dimensionality of the data, we apply principal component analysis to extract latent features for different stations and for different calendar days. Our unsupervised clustering results demonstrate that these latent features are highly discriminative. They allow us to derive different station types (residential, commercial, and mixed) and to effectively classify and identify the passenger flow of “unknown” stations. Finally, we also show that this classification can be applied to predict the passenger volume at stations. By learning latent features of stations for some time, we are able to predict the flow for the following hours. Extensive experimentation using a baseline neural network and two naive periodicity approaches shows the considerable accuracy improvement when using the latent feature based approach.

[1]  O. Cats,et al.  Identification and classification of public transport activity centres in Stockholm using passenger flows data , 2015 .

[2]  Etienne Côme,et al.  Short & long term forecasting of multimodal transport passenger flows with machine learning methods , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[3]  Gustavo E. A. P. A. Batista,et al.  A Study of the Use of Complexity Measures in the Similarity Search Process Adopted by kNN Algorithm for Time Series Prediction , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  Yu Zheng,et al.  Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction , 2016, AAAI.

[6]  Nectaria Tryfona,et al.  Dynamic Travel Time Maps - Enabling Efficient Navigation , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[7]  Ding Luo,et al.  Analysis of network-wide transit passenger flows based on principal component analysis , 2017, 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS).

[8]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[9]  Abdeltawab M. Hendawi,et al.  Predictive tree: An efficient index for predictive queries on road networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[11]  Catherine Morency,et al.  Smart card data use in public transit: A literature review , 2011 .

[12]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[13]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[14]  M. Batty,et al.  Measuring variability of mobility patterns from multiday smart-card data , 2015, J. Comput. Sci..

[15]  Ruibin Zhang,et al.  Referential kNN Regression for Financial Time Series Forecasting , 2013, ICONIP.

[16]  I Okutani,et al.  Dynamic prediction of traffic volume through Kalman Filtering , 1984 .

[17]  Hilmi Berk Celikoglu,et al.  Public transportation trip flow modeling with generalized regression neural networks , 2007, Adv. Eng. Softw..

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Ahlame Douzal Chouakria,et al.  Temporal and Frequential Metric Learning for Time Series kNN Classification , 2015, AALTD@PKDD/ECML.

[20]  Zhenyu Chen,et al.  Predicting Passengers in Public Transportation Using Smart Card Data , 2015, ADC.

[21]  Jiri Dvorský,et al.  P System Based Model of Passenger Flow in Public Transportation Systems: a Case Study of Prague Metro , 2013, DATESO.

[22]  Wanli Min,et al.  Real-time road traffic prediction with spatio-temporal correlations , 2011 .

[23]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[24]  Soong Moon Kang,et al.  Structure of Urban Movements: Polycentric Activity and Entangled Hierarchical Flows , 2010, PloS one.

[25]  Le Minh Kieu,et al.  Passenger Segmentation Using Smart Card Data , 2015, IEEE Transactions on Intelligent Transportation Systems.

[26]  Hans-Peter Kriegel,et al.  Statistical Density Prediction in Traffic Networks , 2008, SDM.