GeoScope: Online Detection of Geo-Correlated Information Trends in Social Networks

The First Law of Geography states "Everything is related to everything else, but near things are more related than distant things". This spatial significance has implications in various applications, trend detection being one of them. In this paper we propose a new algorithmic tool, GeoScope, to detect geo-trends. GeoScope is a data streams solution that detects correlations between topics and locations in a sliding window, in addition to analyzing topics and locations independently. GeoScope offers theoretical guarantees for detecting all trending correlated pairs while requiring only sub-linear space and running time. We perform various human validation tasks to demonstrate the value of GeoScope. The results show that human judges prefer GeoScope to the best performing baseline solution 4:1 in terms of the geographical significance of the presented information. As the Twitter analysis demonstrates, GeoScope successfully filters out topics without geo-intent and detects various local interests such as emergency events, political demonstrations or cultural events. Experiments on Twitter show that GeoScope has perfect recall and near-perfect precision.

[1]  George Hripcsak,et al.  A statistical methodology for analyzing co-occurrence data from a large sample , 2007, J. Biomed. Informatics.

[2]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[3]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[4]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[5]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[6]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[7]  Howard H. Greenbaum,et al.  Organizational Communication , 1988, The SAGE Encyclopedia of Higher Education.

[8]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[9]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[10]  Yu Meng,et al.  Efficient Mining of Emerging Events in a Dynamic Spatiotemporal Environment , 2006, PAKDD.

[11]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[12]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[13]  Xing Xie,et al.  Collaborative location and activity recommendations with GPS history data , 2010, WWW '10.

[14]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[15]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[16]  Nick Koudas,et al.  BlogScope: A System for Online Analysis of High Volume Text Streams , 2007, VLDB.

[17]  Marios Hadjieleftheriou,et al.  Finding the frequent items in streams of data , 2009, CACM.

[18]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[19]  Leysia Palen,et al.  "Voluntweeters": self-organizing by digital volunteers in times of crisis , 2011, CHI.

[20]  Barbara Poblete,et al.  Do all birds tweet the same?: characterizing twitter around the world , 2011, CIKM '11.

[21]  Yu Zheng,et al.  Tutorial on Location-Based Social Networks , 2012 .

[22]  Krzysztof Janowicz,et al.  On the Geo-Indicativeness of Non-Georeferenced Text , 2012, ICWSM.

[23]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[24]  William M. Pottenger,et al.  Detecting emerging concepts in textual data mining , 2001 .

[25]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[26]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[27]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[28]  Douglas G. Altman,et al.  Statistics with confidence: Confidence intervals and statistical guidelines . , 1990 .

[29]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[30]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[31]  Divyakant Agrawal,et al.  Structural Trend Analysis for Online Social Networks , 2011, Proc. VLDB Endow..

[32]  Dimitrios Gunopulos,et al.  Efficient Mining of Spatiotemporal Patterns , 2001, SSTD.

[33]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[34]  Divyakant Agrawal,et al.  SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting , 2008, Proc. VLDB Endow..

[35]  Anuj R. Jaiswal,et al.  Analytics : Applications in Crisis Management , 2011 .

[36]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[37]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[38]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[39]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[40]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[41]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[42]  Angelo Dalli System for spatio-temporal analysis of online news and blogs , 2006, WWW '06.

[43]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.