STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream

What is happening around the world? When and where? Mining the geo-tagged Twitter stream makes it possible to answer the above questions in real-time. Although a single tweet can be short and noisy, proper aggregations of tweets can provide meaningful results. In this paper, we focus on hierarchical spatio-temporal hashtag clustering techniques. Our system has the following features: (1) Exploring events (hashtag clusters) with different space granularity. Users can zoom in and out on maps to find out what is happening in a particular area. (2) Exploring events with different time granularity. Users can choose to see what is happening today or in the past week. (3) Efficient single-pass algorithm for event identification, which provides human-readable hashtag clusters. (4) Efficient event ranking which aims to find burst events and localized events given a particular region and time frame. To support aggregation with different space and time granularity, we propose a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy. To achieve high scalability, we propose a divide-and-conquer method to construct the STREAMCUBE. To support flexible event ranking with different weights, we proposed a top-k based index. Different efficient methods are used to speed up event similarity computations. Finally, we have conducted extensive experiments on a real twitter data. Experimental results show that our framework can provide meaningful results with high scalability.

[1]  Divyakant Agrawal,et al.  GeoScope: Online Detection of Geo-Correlated Information Trends in Social Networks , 2013, Proc. VLDB Endow..

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[4]  Xiaokui Xiao,et al.  LSII: An indexing structure for exact real-time search on microblogs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Dimitrios Gunopulos,et al.  STEM: a spatio-temporal miner for bursty activity , 2013, SIGMOD '13.

[6]  Ari Rappoport,et al.  Efficient Clustering of Short Messages into General Domains , 2013, ICWSM.

[7]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[8]  Wouter Weerkamp,et al.  Twitter hashtags: Joint Translation and Clustering , 2011 .

[9]  Junjie Yao,et al.  Provenance-based Indexing Support in Micro-blog Platforms , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[11]  Charu C. Aggarwal,et al.  Mining text and social streams: a review , 2014, SKDD.

[12]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[13]  Yang Wang,et al.  Location-aware publish/subscribe , 2013, KDD.

[14]  Gabriela Andreea Morar,et al.  Exploring the Meaning behind Twitter Hashtags through Clustering , 2012, BIS.

[15]  Zhenhua Wang,et al.  Sumblr: continuous summarization of evolving tweet streams , 2013, SIGIR.

[16]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[17]  Suman Nath,et al.  Mercury: A memory-constrained spatio-temporal real-time search on microblogs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[19]  Ahmed Eldawy,et al.  Sindbad: a location-based social networking system , 2012, SIGMOD Conference.

[20]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[21]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[22]  Dafna Shahaf,et al.  Information cartography: creating zoomable, large-scale maps of information , 2013, KDD.

[23]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[24]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[25]  MaddenSamuel,et al.  Processing and visualizing the data in tweets , 2012 .

[26]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[27]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[28]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[29]  Hua Lu,et al.  A unified model for stable and temporal topic detection from social media data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[30]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[31]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[32]  Halit Oguztüzün,et al.  Semantic Expansion of Tweet Contents for Enhanced Event Detection in Twitter , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.