Exploiting Foursquare and Cellular Data to Infer User Activity in Urban Environments

Inferring the type of activities in neighborhoods of urban centers may be helpful in a number of contexts including urban planning, content delivery and activity recommendations for mobile web users or may even yield to a deeper understanding of the geographical evolution of social life in the city . During the past few years, the analysis of mobile phone usage patterns, or of social media with longitudinal attributes, have aided the automatic characterization of the dynamics of the urban environment. In this work, we combine a dataset sourced from a telecommunication provider in Spain with a database of millions of geotagged venues from Foursquare and we formulate the problem of urban activity inference in a supervised learning framework. In particular, we exploit user communication patterns observed at the base station level in order to predict the activity of Foursquare users who checkin-in at nearby venues. First, we mine a set of machine learning features that allow us to encode the input telecommunication signal of a tower. Subsequently, we evaluate a diverse set of supervised learning algorithms using labels extracted from Foursquare place categories and we consider two application scenarios. Initially, we assess how hard it is to predict specific urban activity of an area, showing that Nightlife and Entertainment spots are those easier to infer, whereas College and Shopping areas are those featuring the lowest accuracy rates. Then, considering a candidate set of activity types in a geographic area, we aim to elect the most prominent one. We demonstrate how the difficulty of the problem increases with the number of classes incorporated in the prediction task, yet the classifiers achieve a considerably better performance compared to a random guess even when the set of candidate classes increases.

[1]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[2]  Carlo Ratti,et al.  Mobile Landscapes: Using Location Data from Cell Phones for Urban Analysis , 2006 .

[3]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[4]  Kazutoshi Sumiya,et al.  Exploring urban characteristics using movement history of mass mobile microbloggers , 2010, HotMobile '10.

[5]  Carlo Ratti,et al.  Cellular Census: Explorations in Urban Data Collection , 2007, IEEE Pervasive Computing.

[6]  Soong Moon Kang,et al.  Structure of Urban Movements: Polycentric Activity and Entangled Hierarchical Flows , 2010, PloS one.

[7]  Cecilia Mascolo,et al.  Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks , 2011, The Social Mobile Web.

[8]  Licia Capra,et al.  Measuring the impact of opening the London shared bicycle scheme to casual users , 2012 .

[9]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[14]  Louise Nash,et al.  Smart cities: bridging physical and digital , 2012 .

[15]  Licia Capra,et al.  Mining mobility data to minimise travellers' spending on public transport , 2011, KDD.

[16]  Kazutoshi Sumiya,et al.  Urban Area Characterization Based on Semantics of Crowd Activities in Twitter , 2011, GeoS.

[17]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[18]  R. Walgate Tale of two cities , 1984, Nature.

[19]  Stan Matwin,et al.  Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[20]  Victor Soto,et al.  Robust Land Use Characterization of Urban Landscapes using Cell Phone Data , 2011 .