Big data in public transportation: a review of sources and methods

ABSTRACT The collection of big data, as an alternative to traditional resource-intensive manual data collection approaches, has become significantly more feasible over the past decade. The availability of such data, coupled with more sophisticated predictive statistical techniques, has contributed to an increase in attention towards the application of these data, particularly for transportation analysis. Within the transportation literature, there is a growing emphasis on developing sources of commonly collected public transportation data into more powerful analytical tools. A commonly held belief is that application of big data to transportation problems will yield new insights previously unattainable through traditional transportation data sets. However, there exist many ambiguities related to what constitutes big data, the ethical implications of big data collection and application, and how to best utilize the emerging data sets. The existing literature exploring big data provides no clear and consistent definition. While the collection of big data has grown and its application in both research and practice continues to expand, there is a significant disparity between methods of analysis applied to such data. This paper summarizes the recent literature on sources of big data and commonly applied methods used in its application to public transportation problems. We assess predominant big data sources, most frequently studied topics, and methodologies employed. The literature suggests smart card and automated data are the two big data sources most frequently used by researchers to conduct public transit analyses. The studies reviewed indicate that big data has largely been used to understand transit users’ travel behavior and to assess public transit service quality. The techniques reported in the literature largely mirror those used with smaller data sets. The application of more advanced statistical methods, commonly associated with big data, has been limited to a small number of studies. In order to fully capture the value of big data, new approaches to analysis will be necessary.

[1]  Jian-cheng Weng,et al.  Identification of Inelastic Subway Trips Based on Weekly Station Sequence Data: An Example from the Beijing Subway , 2018, Sustainability.

[2]  Ting Li,et al.  Sustainable revenue management: A smart card enabled agent-based modeling approach , 2013, Decis. Support Syst..

[3]  Boyang Li,et al.  Big Data Analytics for Electric Vehicle Integration in Green Smart Cities , 2017, IEEE Communications Magazine.

[4]  Piotr Szymański,et al.  Spatio-Temporal Profiling of Public Transport Delays Based on Large-Scale Vehicle Positioning Data From GPS in Wrocław , 2017, IEEE Transactions on Intelligent Transportation Systems.

[5]  T. Wenzel,et al.  Big data driven dynamic driving cycle development for busses in urban public transportation , 2017 .

[6]  Ahmad Tavassoli,et al.  Public transport trip purpose inference using smart card fare data , 2018 .

[7]  Amedeo R. Odoni,et al.  BusViz: Big Data for Bus Fleets , 2016 .

[8]  Robert Chapleau,et al.  Bus Network Microsimulation with General Transit Feed Specification and Tap-in-Only Smart Card Data , 2016 .

[9]  Otto Anker Nielsen,et al.  Passenger arrival and waiting time distributions dependent on train service frequency and station characteristics: A smart card data analysis , 2018 .

[10]  Ashish Gupta,et al.  Machine Learning Classifiers for Predicting Transit Fraud , 2018, AMCIS.

[11]  Mahmoud Mesbah,et al.  Spatial-temporal similarity correlation between public transit passengers using smart card data , 2017 .

[12]  Michael D Eichler,et al.  Drinking from the Fire Hose: Visualizing Metrorail's Fare System Data , 2015 .

[13]  Jamal Maktoubian Proposing a streaming Big Data analytics (SBDA) platform for condition based maintenance (CBM) and monitoring transportation systems , 2017, EAI Endorsed Trans. Scalable Inf. Syst..

[14]  Mashrur Chowdhury,et al.  Potentials of Online Media and Location-Based Big Data for Urban Transit Networks in Developing Countries , 2015 .

[15]  Zbigniew Smoreda,et al.  Unravelling daily human mobility motifs , 2013, Journal of The Royal Society Interface.

[16]  Erik Jenelius,et al.  Impact analysis of transport network disruptions using multimodal data : A case study for tunnel closures in Stockholm , 2018 .

[17]  João Luiz Afonso,et al.  Methodology for Knowledge Extraction from Mobility Big Data , 2016, DCAI.

[18]  Bekir Bartin,et al.  Evaluating the resilience and recovery of public transit system using big data: Case study from New Jersey , 2019 .

[19]  D. Zhang,et al.  High-speed Train Control System Big Data Analysis Based on the Fuzzy RDF model and Uncertain Reasoning , 2017, Int. J. Comput. Commun. Control.

[20]  Oded Cats,et al.  Automated Setting of Bus Schedule Coverage Using Unsupervised Machine Learning , 2016, PAKDD.

[21]  Yan Liu,et al.  Modeling the Influence of Weather on Transit Ridership: A Case Study from Brisbane, Australia , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[22]  Yang Li,et al.  Spatial Accessibility to Hospitals Based on Web Mapping API: An Empirical Study in Kaifeng, China , 2019, Sustainability.

[23]  Yang Li,et al.  Forecasting short-term subway passenger flow under special events scenarios using multiscale radial basis function networks ☆ , 2017 .

[24]  Jau-Ming Su,et al.  Integration of Transit Demand and Big Data for Bus Route Design in Taiwan , 2016 .

[25]  Dongyuan Yang,et al.  Identifying Public Transit Commuters Based on Both the Smartcard Data and Survey Data: A Case Study in Xiamen, China , 2018, Journal of Advanced Transportation.

[26]  Ziyou Gao,et al.  Recognizing the Critical Stations in Urban Rail Networks: An Analysis Method Based on the Smart-Card Data , 2019, IEEE Intelligent Transportation Systems Magazine.

[27]  Ruimin Li,et al.  Evaluation Index Development for Intelligent Transportation System in Smart Community Based on Big Data , 2015 .

[29]  Jonathan Corcoran,et al.  To travel or not to travel: ‘Weather’ is the question. Modelling the effect of local weather conditions on bus ridership , 2018 .

[30]  Serge Abiteboul,et al.  Hup-me: inferring and reconciling a timeline of user activity from rich smartphone data , 2015, SIGSPATIAL/GIS.

[31]  Lauren Tarte,et al.  A Data-Driven Approach to Prioritizing Bus Schedule Revisions at New York City Transit , 2018 .

[32]  Etienne Côme,et al.  Analyzing year-to-year changes in public transport passenger behaviour using smart card data , 2017 .

[33]  Liuqing Yang,et al.  Big Data for Social Transportation , 2016, IEEE Transactions on Intelligent Transportation Systems.

[34]  Wei Tu,et al.  Impacts of weather on public transport ridership: Results from mining data from different sources ☆ , 2017 .

[35]  Elizabeth J. Traut,et al.  Identifying commonly used and potentially unsafe transit transfers with crowdsourcing , 2019, Transportation Research Part A: Policy and Practice.

[36]  Shunzhi Zhu,et al.  Passenger Flow Prediction Using Weather Data for Metro Systems , 2018, 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[37]  Steven J. Landry,et al.  Advances in Human Aspects of Transportation , 2017 .

[38]  A. Hamidi,et al.  Does demand for subway ridership in Manhattan depend on the rainfall events? , 2019, Transport Policy.

[39]  Natalia Sadovnikova,et al.  Strategway: web solutions for building public transportation routes using big geodata analysis , 2015, iiWAS.

[40]  Y. Bie,et al.  Impacts of Winter Weather on Bus Travel Time in Cold Regions: Case Study of Harbin, China , 2018, Journal of Transportation Engineering, Part A: Systems.

[41]  Helmar Burkhart,et al.  Traffic flow measurement of a public transport system through automated Web observation , 2017, 2017 11th International Conference on Research Challenges in Information Science (RCIS).

[42]  Michael Batty,et al.  A Big Data Mashing Tool for Measuring Transit System Performance , 2017 .

[43]  Zhongliang Cai,et al.  A geo-big data approach to intra-urban food deserts: Transit-varying accessibility, social inequalities, and implications for urban planning , 2017 .

[44]  Ties Brands,et al.  Short-Term Prediction of Ridership on Public Transport with Smart Card Data , 2015 .

[45]  Pu Wang,et al.  Transportation Mode Split with Mobile Phone Data , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[46]  Brian Levine,et al.  Transforming Bus Service Planning Using Integrated Electronic Data Sources at NYC Transit , 2016 .

[47]  Congdong Li,et al.  Multi-Objective Optimization Model of Emergency Organization Allocation for Sustainable Disaster Supply Chain , 2017 .

[48]  Masaru Kitsuregawa,et al.  Visual fusion of mega-city big data: An application to traffic and tweets data analysis of Metro passengers , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[49]  Gang Ren,et al.  Clustering Analysis of Ridership Patterns at Subway Stations: A Case in Nanjing, China , 2019, Journal of Urban Planning and Development.

[50]  Xingjian Liu,et al.  Early Birds, Night Owls, and Tireless/Recurring Itinerants: An Exploratory Analysis of Extreme Transit Behaviors in Beijing, China , 2015, ArXiv.

[51]  Jonathan Corcoran,et al.  Mapping cities by transit riders’ trajectories: The case of Brisbane, Australia , 2017 .

[52]  Huiling Chen,et al.  Circuity Characteristics of Urban Travel Based on GPS Data: A Case Study of Guangzhou , 2017 .

[53]  Agostino Nuzzolo,et al.  Advanced public transport systems and ITS: New tools for operations control and traveler advising , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[54]  Is It Too Crowded in Here?: In Search of Safety Standards for Pedestrian Congestion in Rail Stations , 2017 .

[55]  Sergio Nesmachnow,et al.  Distributed Big Data Analysis for Mobility Estimation in Intelligent Transportation Systems , 2016, CARLA.

[56]  Kaan Ozbay,et al.  Using Big Data of Automated Fare Collection System for Analysis and Improvement of BRT-Bus Rapid Transit Line in Istanbul , 2015 .

[57]  Daryl J. D'Souza,et al.  A Cloud Model for Distributed Transport System Integration , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[58]  Catherine Morency,et al.  Smart card data use in public transit: A literature review , 2011 .

[59]  Guillaume Bouchard,et al.  Fare Collection Data Analytics and Visualization for Public Transportation , 2012 .

[60]  Haris N. Koutsopoulos,et al.  Automated data in transit: Recent developments and applications , 2017, 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS).

[61]  Xiao Fu,et al.  Impact of a New Metro Line: Analysis of Metro Passenger Flow and Travel Time Based on Smart Card Data , 2018, Journal of Advanced Transportation.

[62]  S. C. Wirasinghe,et al.  Analysis of bus travel time distributions for varying horizons and real-time applications , 2018 .

[63]  O. Marlin,et al.  Bridging East and West , 2019, AAAS Articles DO Group.

[64]  Divyesh Jadav,et al.  CENSE: A Cognitive Navigation System for People with Special Needs , 2017, 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService).

[65]  Tianyi Ma,et al.  MOBANA: A Distributed Stream-Based Information System for Public Transit , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[66]  Abhishek Singhal,et al.  A big data driven model for taxi drivers' airport pick-up decisions in New York City , 2013, 2013 IEEE International Conference on Big Data.

[67]  Bruno Agard,et al.  A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method , 2018, Transportmetrica A: Transport Science.

[68]  Christophe Hurter,et al.  Interactive image-based information visualization for aircraft trajectory analysis , 2014 .

[69]  Yuan Ren,et al.  Passenger Travel Regularity Analysis Based on a Large Scale Smart Card Data , 2018, Journal of Advanced Transportation.

[70]  Lu Jianfeng,et al.  Research and application of the location information in the intelligent transportation , 2013, CloudCom 2013.

[71]  Roman Kern,et al.  QZTool—Automatically Generated Origin-Destination Matrices from Cell Phone Trajectories , 2017 .

[72]  Marta C. González,et al.  The path most traveled: Travel demand estimation using big data resources , 2015, Transportation Research Part C: Emerging Technologies.

[73]  Mehmet M. Dalkilic,et al.  Using Data Analytics to Optimize Public Transportation on a College Campus , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[74]  Sudha Ram,et al.  A big data approach for smart transportation management on bus network , 2016, 2016 IEEE International Smart Cities Conference (ISC2).

[75]  Davide Tosi,et al.  Big Data from Cellular Networks: Real Mobility Scenarios for Future Smart Cities , 2016, 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).

[76]  Kun Xie,et al.  Using Big Data to Study Resilience of Taxi and Subway Trips for Hurricanes Sandy and Irene , 2016 .

[77]  Robert L. Bertini,et al.  Perspectives on Transit: Potential Benefits of Visualizing Transit Data , 2016 .

[78]  Robert Chapleau,et al.  Scrutinizing Weekly Travel Behavior Patterns in Paratransit: Results of a Big Data Experiment , 2013 .

[79]  J. Corcoran,et al.  Exploring Bus Rapid Transit passenger travel behaviour using big data , 2014 .

[80]  Junyoug Choi,et al.  Utilizing Spatial Big Data platform in evaluating correlations between rental housing car sharing and public transportation , 2017, Spatial Information Research.

[81]  Robert Shorten,et al.  A big-data model for multi-modal public transportation with application to macroscopic control and optimisation , 2015, Int. J. Control.

[82]  Yunpeng Wang,et al.  Understanding commuting patterns using transit smart card data , 2017 .

[83]  Hua Cai,et al.  Greenhouse gas implications of fleet electrification based on big data-informed individual travel patterns. , 2013, Environmental science & technology.

[84]  S. Fawcett,et al.  Data Science, Predictive Analytics, and Big Data: A Revolution that Will Transform Supply Chain Design and Management , 2013 .

[85]  Philip C. Treleaven,et al.  Social media analytics: a survey of techniques, tools and platforms , 2014, AI & SOCIETY.

[86]  Rung-Ching Chen,et al.  A novel passenger flow prediction model using deep learning methods , 2017 .

[87]  Marta C. González,et al.  Coupling human mobility and social ties , 2015, Journal of The Royal Society Interface.

[88]  Antonio Gschwender,et al.  Using smart card and GPS data for policy and planning: The case of Transantiago , 2016 .

[89]  Wandi Zhang,et al.  Evaluation of carbon emission reductions promoted by private driving restrictions based on automatic fare collection data in Beijing, China , 2017, Journal of the Air & Waste Management Association.

[90]  Qingquan Li,et al.  Spatial variations in urban public ridership derived from GPS trajectories and smart card data , 2018 .

[91]  Sheng Wei,et al.  Exploring the potential of open big data from ticketing websites to characterize travel patterns within the Chinese high-speed rail system , 2017, PloS one.

[92]  Nengcheng Chen,et al.  An Efficient Method of Sharing Mass Spatio-Temporal Trajectory Data Based on Cloudera Impala for Traffic Distribution Mapping in an Urban City , 2016, Sensors.

[93]  Marco Luca Sbodio,et al.  AllAboard: A System for Exploring Urban Mobility and Optimizing Public Transport Using Cellphone Data , 2013, ECML/PKDD.

[94]  Kangning Zheng,et al.  Estimating metro passengers’ path choices by combining self-reported revealed preference and smart card data , 2018, Transportation Research Part C: Emerging Technologies.

[95]  Qing He,et al.  A robust method for estimating transit passenger trajectories using automated data , 2018 .

[96]  Ahmad Tavassoli,et al.  A model for measuring activity similarity between public transit passengers using smart card data , 2018, Travel Behaviour and Society.

[97]  Jonathan Corcoran,et al.  Route choice stickiness of public transport passengers: Measuring habitual bus ridership behaviour using smart card data , 2017 .

[98]  Yu Liu,et al.  The promises of big data and small data for travel behavior (aka human mobility) analysis , 2016, Transportation research. Part C, Emerging technologies.

[99]  Alexandre M. Bayen,et al.  Understanding Road Usage Patterns in Urban Areas , 2012, Scientific Reports.

[100]  Kai Lu,et al.  Open big data from ticketing website as a useful tool for characterizing spatial features of the Chinese high-speed rail system , 2018, Journal of Spatial Science.