Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning

With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM. To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. CrossMap first employs an accelerated mode seeking procedure to detect spatiotemporal hotspots underlying people's activities. Those detected hotspots not only address spatiotemporal variations, but also largely alleviate the sparsity of the GTSM data. With the detected hotspots, CrossMap then jointly embeds all spatial, temporal, and textual units into the same space using two different strategies: one is reconstruction-based and the other is graph-based. Both strategies capture the correlations among the units by encoding their co-occurrence and neighborhood relationships, and learn low-dimensional representations to preserve such correlations. Our experiments demonstrate that CrossMap not only significantly outperforms state-of-the-art methods for activity recovery and classification, but also achieves much better efficiency.

[1]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[2]  Sergej Sizov,et al.  GeoFolk: latent spatial semantics in web 2.0 social media , 2010, WSDM '10.

[3]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[4]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[5]  Anthony K. H. Tung,et al.  Trendspedia: An Internet observatory for analyzing and visualizing the evolving web , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Lidan Shou,et al.  Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories , 2014, Proc. VLDB Endow..

[7]  Wei Zhang,et al.  STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Víctor Soto,et al.  Characterizing Urban Landscapes Using Geolocated Tweets , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[9]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[10]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[11]  Jiawei Han,et al.  Large-Scale Embedding Learning in Heterogeneous Event Data , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[14]  Kazutoshi Sumiya,et al.  Discovery of unusual regional social activities using geo-tagged microblogs , 2011, World Wide Web.

[15]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[16]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[17]  Eric Horvitz,et al.  Eyewitness: identifying local events via space-time signals in twitter feeds , 2015, SIGSPATIAL/GIS.

[18]  Cecilia Mascolo,et al.  Exploiting Foursquare and Cellular Data to Infer User Activity in Urban Environments , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[19]  C. N. Brorn,et al.  WHO? , 1896 .

[20]  Ke Zhang,et al.  On the importance of temporal dynamics in modeling urban activity , 2013, UrbComp '13.

[21]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[22]  Jiajun Liu,et al.  Understanding Human Mobility from Twitter , 2014, PloS one.

[23]  Wei Zhang,et al.  PRED: Periodic Region Detection for Mobility Modeling of Social Media Users , 2017, WSDM.

[24]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[25]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[26]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[27]  Cecilia Mascolo,et al.  Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks , 2011, The Social Mobile Web.

[28]  Luming Zhang,et al.  GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media , 2016, KDD.

[29]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Cecilia Mascolo,et al.  Measuring Urban Social Diversity Using Interconnected Geo-Social Networks , 2016, WWW.

[32]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[33]  Yu Zheng,et al.  Methodologies for Cross-Domain Data Fusion: An Overview , 2015, IEEE Transactions on Big Data.

[34]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[37]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[38]  Cecilia Mascolo,et al.  Tracking urban activity growth globally with big location data , 2015, Royal Society Open Science.

[39]  Miguel Á. Carreira-Perpiñán Acceleration Strategies for Gaussian Mean-Shift Image Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[41]  Steffen Staab,et al.  Detecting non-gaussian geographical topics in tagged photo collections , 2014, WSDM.

[42]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[43]  References , 1971 .

[44]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.

[45]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.