Shaping City Neighborhoods Leveraging Crowd Sensors

Location-based social networks (LBSN) are capturing large amount of data related to whereabouts of their users. This has become a social phenomenon, that is changing the normal communication means and it opens new research perspectives on how to compute descriptive models out of this collection of geo-spatial data. In this paper, we propose a methodology for clustering location-based information in order to provide first glance summaries of geographic areas. The summaries are a composition of fingerprints, each being a cluster, generated by a new subspace clustering algorithm, named GeoSubClu, that is proposed in this paper. The algorithm is parameter-less: it automatically recognizes areas with homogeneous density of similar points of interest and provides clusters with a rich characterization in terms of the representative categories. We measure the validity of the generated clusters using both a qualitative and a quantitative evaluation. In the former, we benchmark the results of our methodology over an existing gold standard, and we compare the achieved results against two baselines. We then further validate the generated clusters using a quantitative analysis, over the same gold standard and a new geographic extent, using statistical validation measures. Results of the qualitative and quantitative experiments show the robustness of our approach in creating geographic clusters which are significant both for humans (holding a F-measure of 88.98% over the gold standard) and from a statistical point of view.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Santi Phithakkitnukoon,et al.  Sensing Urban Social Geography Using Online Social Networking Data , 2011, The Social Mobile Web.

[3]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[4]  Alberto Del Bimbo,et al.  LiveCities: revealing the pulse of cities by location-based social networks venues and users analysis , 2014, WWW '14 Companion.

[5]  R. Walpole Essentials of Probability and Statistics for Engineers and Scientists: Pearson New International Edition , 2016 .

[6]  Elke Achtert,et al.  Interactive data mining with 3D-parallel-coordinate-trees , 2013, SIGMOD '13.

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[9]  R. H. Myers,et al.  STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM , 2016 .

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Josep Blat,et al.  Digital Footprinting: Uncovering Tourists with User-Generated Content , 2008, IEEE Pervasive Computing.

[12]  Vipin Kumar,et al.  Summarization - compressing data into an informative representation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Ruggero G. Pensa,et al.  Geographic Summaries from Crowdsourced Data , 2014, ESWC.

[14]  Donato Malerba,et al.  Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering , 2014, Data Mining and Knowledge Discovery.

[15]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[16]  Franco Zambonelli,et al.  Extracting urban patterns from location-based social networks , 2011, LBSN '11.

[17]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[18]  Raphaël Troncy,et al.  The 3cixty Knowledge Base for Expo Milano 2015: Enabling Visitors to Explore the City , 2015, K-CAP.

[19]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Chandan K. Reddy,et al.  Location-specific tweet detection and topic summarization in Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[22]  Cecilia Mascolo,et al.  Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks , 2011, The Social Mobile Web.

[23]  Kazutoshi Sumiya,et al.  Exploring geospatial cognition based on location-based social network sites , 2014, World Wide Web.

[24]  Kazutoshi Sumiya,et al.  Urban area characterization based on crowd behavioral lifelogs over Twitter , 2012, Personal and Ubiquitous Computing.

[25]  Rosa Meo,et al.  The Exploitation of Data from Remote and Human Sensors for Environment Monitoring in the SMAT Project , 2012, Sensors.