Inferring modes of transportation using mobile phone data

Cities are growing at a fast rate, and transportation networks need to adapt accordingly. To design, plan, and manage transportation networks, domain experts need data that reflect how people move from one place to another, at what times, for what purpose, and in what mode(s) of transportation. However, traditional data collection methods are not cost-effective or timely. For instance, travel surveys are very expensive, collected every ten years, a period of time that does not cope with quick city changes, and using a relatively small sample of people. In this paper, we propose an algorithmic pipeline to infer the distribution of mode of transportation usage in a city, using mobile phone network data. Our pipeline is based on a Topic-Supervised Non-Negative Matrix Factorization model, using a Weak-Labeling strategy on user trajectories with data obtained from open datasets, such as GTFS and OpenStreetMap. As a case study, we show results for the city of Santiago, Chile, which has a sophisticated intermodal public transportation system. Importantly, our pipeline delivers coherent results that are explainable, with interpretable parameters at each step. Finally, we discuss the potential applications and implications of such a system in transportation and urban planning.

[1]  Brenton M. Wiernik,et al.  Does perceived stress mediate the relationship between commuting and health-related quality of life? , 2017 .

[2]  Vladia Pinheiro,et al.  Micro-interventions in urban transportation from pattern discovery on the flow of passengers and on the bus network , 2016, 2016 IEEE International Smart Cities Conference (ISC2).

[3]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[4]  Laura Ferrari,et al.  Urban Sensing Using Mobile Phone Network Data: A Survey of Research , 2014, ACM Comput. Surv..

[5]  J. D. Whyatt,et al.  Line generalisation by repeated elimination of points , 1993 .

[6]  Vincent D. Blondel,et al.  A survey of results on mobile phone datasets analysis , 2015, EPJ Data Science.

[7]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[8]  Rossano Schifanella,et al.  The Digital Life of Walkable Streets , 2015, WWW.

[9]  M. Hilbert,et al.  Big Data for Development: A Review of Promises and Challenges , 2016 .

[10]  Matthew E. Kahn,et al.  Why do the poor live in cities? The role of public transportation ✩ , 2008 .

[11]  Leo Ferres,et al.  The effect of Pokémon Go on the pulse of the city: a natural experiment , 2016, EPJ Data Science.

[12]  Pablo Guarda,et al.  What is behind fare evasion in urban bus systems? An econometric approach , 2016 .

[13]  Antonio Gschwender,et al.  Transantiago: A tale of two cities , 2007 .

[14]  Glenn Lyons,et al.  A Human Perspective on the Daily Commute: Costs, Benefits and Trade‐offs , 2008 .

[15]  Simon Washington,et al.  Shortest path and vehicle trajectory aided map-matching for low frequency GPS data , 2015 .

[16]  Vanessa Frías-Martínez,et al.  Estimation of urban commuting patterns using cellphone network data , 2012, UrbComp '12.

[17]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[18]  Carlo Ratti,et al.  Transportation mode inference from anonymized and aggregated mobile phone call detail records , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[19]  Nathan Eagle,et al.  Limits of Predictability in Commuting Flows in the Absence of Data for Calibration , 2014, Scientific Reports.

[20]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Pietro Liò,et al.  Collective Human Mobility Pattern from Taxi Trips in Urban Area , 2012, PloS one.

[23]  Carlo Ratti,et al.  Geo-located Twitter as proxy for global mobility patterns , 2013, Cartography and geographic information science.

[24]  M Bayen Alexandre,et al.  Negative externalities of GPS-enabled routing applications: A game theoretical approach , 2016 .

[25]  Felipe González,et al.  A combined destination and route choice model for a bicycle sharing system , 2016 .

[26]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[27]  Mahmoud Mesbah,et al.  Validating and improving public transport origin–destination estimation algorithm using smart card fare data ☆ , 2016 .

[28]  M. Batty,et al.  Gravity versus radiation models: on the importance of scale and heterogeneity in commuting flows. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  R. Walgate Tale of two cities , 1984, Nature.

[30]  Alejandro Tirachini,et al.  Estimation of travel time variability for cars, buses, metro and door-to-door public transport trips in Santiago, Chile , 2016 .

[31]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[32]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[33]  Edward F. McQuarrie,et al.  Focus Groups: Theory and Practice , 1991 .

[34]  Shai Ben-David,et al.  New England , 1894, Letters from America.

[35]  K. Selçuk Candan,et al.  GI-NMF: Group Incremental Non-Negative Matrix Factorization on Data Streams , 2014, CIKM.

[36]  D. Kahneman,et al.  A Survey Method for Characterizing Daily Life Experience: The Day Reconstruction Method , 2004, Science.

[37]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[38]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[39]  Liang Liu,et al.  Estimating Origin-Destination Flows Using Mobile Phone Location Data , 2011, IEEE Pervasive Computing.

[40]  Masao Kuwahara,et al.  Estimating origin-destination matrices from roadside survey data , 1987 .

[41]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[42]  Ciro Cattuto,et al.  Predicting human mobility through the assimilation of social media traces into mobility models , 2016, EPJ Data Science.

[43]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[44]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[45]  Ciro Cattuto,et al.  Shopping mall attraction and social mixing at a city scale , 2018, EPJ Data Science.

[46]  S. O’sullivan,et al.  Walking Distances to and from Light-Rail Transit Stations , 1996 .

[47]  Marta C. González,et al.  Origin-destination trips by purpose and time of day inferred from mobile phone data , 2015 .

[48]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[49]  J. Cullum,et al.  A Lanczos Algorithm for Computing Singular Values and Vectors of Large Matrices , 1983 .

[50]  G. Madey,et al.  Uncovering individual and collective human dynamics from mobile phone records , 2007, 0710.2939.

[51]  E. Cascetta Estimation of trip matrices from traffic counts and survey data: A generalized least squares estimator , 1984 .

[52]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[53]  R. Groves Nonresponse Rates and Nonresponse Bias in Household Surveys , 2006 .

[54]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[55]  Pu Wang,et al.  Development of origin–destination matrices using mobile phone call data , 2014 .

[56]  James D. Wilson,et al.  Topic supervised non-negative matrix factorization , 2017, ArXiv.

[57]  Randolph W. Hall,et al.  Handbook of transportation science , 1999 .

[58]  Eduardo Graells-Garrido,et al.  Sensing Urban Patterns with Antenna Mappings: The Case of Santiago, Chile , 2016, Sensors.

[59]  Eduardo Graells-Garrido,et al.  A Day of Your Days: Estimating Individual Daily Journeys Using Mobile Data to Understand Urban Flow , 2016, Urb-IoT.

[60]  Susan Parham,et al.  Happy city: transforming our lives through urban design , 2014 .

[61]  Hjp Harry Timmermans,et al.  Transportation mode recognition using GPS and accelerometer data , 2013 .

[62]  Yukiko Kawai,et al.  Twitter-based Urban Area Characterization by Non-negative Matrix Factorization , 2015, BigDAS.

[63]  T. Arentze,et al.  Travelers’ Preferences in Multimodal Networks: Design and Results of a Comprehensive Series of Choice Experiments , 2013 .

[64]  Scott A. Hale,et al.  Estimating local commuting patterns from geolocated Twitter data , 2017, EPJ Data Science.

[65]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[66]  Longbing Cao,et al.  Data Science , 2017, ACM Comput. Surv..

[67]  Eduardo Graells-Garrido,et al.  Toward Finding Latent Cities , 2018, IUI Workshops.

[68]  Pu Wang,et al.  Transportation Mode Split with Mobile Phone Data , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[69]  Kees Maat,et al.  Day-to-Day Choice to Commute or Not by Bicycle , 2011 .

[70]  Asha Weinstein Agrawal,et al.  How Far, by Which Route and Why? A Spatial Analysis of Pedestrian Preference , 2007 .

[71]  Marcela Munizaga,et al.  Estimation of a disaggregate multimodal public transport Origin-Destination matrix from passive smartcard data from Santiago, Chile , 2012 .

[72]  Haesun Park,et al.  Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[73]  Hesham A. Rakha,et al.  Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data , 2015, IEEE Transactions on Intelligent Transportation Systems.