Power-law Verification for Event Detection at Multi-spatial Scales from Geo-tagged Tweet Streams

Compared with traditional news media, social media nowadays provides a richer and more timely source of news. We are interested in multi-spatial level event detection from geo-tagged tweet streams. Specifically, in this paper we (1) examine the statistical characteristic for the time series of the number of geo-tagged tweets posted from specific regions during a short time interval, e.g., ten seconds or one minute; (2) verify from over thirty datasets that while almost all such time series exhibit self-similarity, those that correspond to events, especially short-term and unplanned outbursts, follow a power-law distribution; (3) demonstrate that these findings can be applied to facilitate event detection from tweet streams---we propose a simple algorithm that only checks the existence of power-law distributions in the time series from tweet streams at multi-spatial scales, without looking into the content of each tweet. Our experiments on multiple datasets show that by considering spatio-temporal statistical distributions of tweets alone, this seemingly naive algorithm achieves comparable results with event detection methods that perform semantic analysis. We further discuss how to integrate the proposed technique into existing algorithms for better performance.

[1]  Liyuan Liu,et al.  TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams , 2017, KDD.

[2]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[3]  Michalis Faloutsos,et al.  A user-friendly self-similarity analysis tool , 2003, CCRV.

[4]  Pericles A. Mitkas,et al.  Event Detection via LDA for the MediaEval2012 SED Task , 2012, MediaEval.

[5]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[6]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[7]  Roberto Frias,et al.  Twitter event detection: combining wavelet analysis and topic inference summarization , 2011 .

[8]  Pascal Frossard,et al.  Multiscale event detection in social media , 2014, Data Mining and Knowledge Discovery.

[9]  George Valkanas,et al.  Event Detection from Social Media Data , 2013, IEEE Data Eng. Bull..

[10]  Azer Bestavros,et al.  Explaining World Wide Web Traffic Self-Similarity , 1995 .

[11]  Yogesh Virkar,et al.  Power-law distributions in binned empirical data , 2012, 1208.3524.

[12]  George Valkanas,et al.  How the live web feels about events , 2013, CIKM.

[13]  E. H. Lloyd,et al.  Long-Term Storage: An Experimental Study. , 1966 .

[14]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[15]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[16]  Halit Oguztüzün,et al.  Semantic Expansion of Tweet Contents for Enhanced Event Detection in Twitter , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[17]  Shaowen Wang,et al.  GeoBurst+ , 2018, ACM Trans. Intell. Syst. Technol..

[18]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[19]  Walter Willinger,et al.  Self-Similarity in High-Speed Packet Traffic: Analysis and Modeling of Ethernet Traffic Measurements , 1995 .

[20]  Michalis Faloutsos,et al.  SELFIS: A Tool For Self-Similarity and Long-Range Dependence Analysis , 2002 .

[21]  Gennady L. Andrienko,et al.  Tracing the German centennial flood in the stream of tweets: first lessons learned , 2013, GEOCROWD '13.

[22]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[23]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[24]  Jordi Torres,et al.  Scaling DBSCAN-like Algorithms for Event Detection Systems in Twitter , 2016, ICA3PP.

[25]  Dimitrios Gunopulos,et al.  Detecting Events in Online Social Networks: Definitions, Trends and Challenges , 2016, Solving Large Scale Learning Tasks.

[26]  Hanan Samet,et al.  Detecting latest local events from geotagged tweet streams , 2018, SIGSPATIAL/GIS.

[27]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[28]  Arkaitz Zubiaga,et al.  A longitudinal assessment of the persistence of twitter datasets , 2017, J. Assoc. Inf. Sci. Technol..

[29]  Denis Nasonov,et al.  Multiscale event detection using convolutional quadtrees and adaptive geogrids , 2018, LENS@SIGSPATIAL.

[30]  Hanan Samet,et al.  Enhancing local live tweet stream to detect news , 2018, GeoInformatica.

[31]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[32]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.