Why so many people? Explaining Nonhabitual Transport Overcrowding With Internet Data

Public transport smartcard data can be used for detection of large crowds. By comparing statistics on habitual behavior (e.g., average by time of day), one can specifically identify nonhabitual crowds, which are often very problematic for transport systems. While habitual overcrowding (e.g., peak hour) is well understood both by traffic managers and travelers, nonhabitual overcrowding hotspots can become even more disruptive and unpleasant because they are generally unexpected. By quickly understanding such cases, a transport manager can react and mitigate transport system disruptions. We propose a probabilistic data analysis model that breaks each nonhabitual overcrowding hotspot into a set of explanatory components. The potential explanatory components are initially retrieved from social networks and special events websites and then processed through text-analysis techniques. Finally, for each such component, the probabilistic model estimates a specific share in the total overcrowding counts. We first validate with synthetic data and then test our model with real data from the public transport system (EZLink) of Singapore, focused on three case study areas. We demonstrate that it is able to generate explanations that are intuitively plausible and consistent both locally (correlation coefficient, i.e., CC, from 85% to 99% for the three areas) and globally (CC from 41.2% to 83.9%). This model is directly applicable to any other domain sensitive to crowd formation due to large social events (e.g., communications, water, energy, waste).

[1]  Zhongwei Li,et al.  Traffic-Known Urban Vehicular Route Prediction Based on Partial Mobility Patterns , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[2]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[3]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[4]  P Bovy,et al.  Big events: planning, mobility management , 2003 .

[5]  Moshe Ben-Akiva,et al.  Internet as a Sensor: a Case Study with Special Events , 2012 .

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Emilio Frazzoli,et al.  A review of urban computing for mobile phone traces: current methods, challenges and opportunities , 2013, UrbComp '13.

[8]  Mohan Kumar,et al.  High accuracy context recovery using clustering mechanisms , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[9]  Filipe Rodrigues,et al.  Estimating Disaggregated Employment Size from Points-of-Interest and Census Data: From Mining the Web to Model Implementation and Visualization , 2013 .

[10]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[11]  Durrell D. Kapan,et al.  Man Bites Mosquito: Understanding the Contribution of Human Movement to Vector-Borne Disease Dynamics , 2009, PloS one.

[12]  Lisa Schweitzer How Are We Doing? Opinion Mining Customer Sentiment in US Transit Agencies and Airlines via Twitter , 2012 .

[13]  Steven P Latoski,et al.  Planned Special Events: Checklists for Practitioners , 2006 .

[14]  Alex Pentland,et al.  Pervasive Sensing to Model Political Opinions in Face-to-Face Networks , 2011, Pervasive.

[15]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[16]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Moshe Ben-Akiva,et al.  Dynamic traffic management , 2006 .

[19]  Carlo Ratti,et al.  The Geography of Taste: Analyzing Cell-Phone Mobility and Social Events , 2010, Pervasive.

[20]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[21]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[22]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[23]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[24]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[25]  Alexander Zipf,et al.  Road-based travel recommendation using geo-tagged images , 2015, Comput. Environ. Urban Syst..