Indexing Evolving Events from Tweet Streams

Tweet streams provide a variety of real-life and real-time information on social events that dynamically change over time. Although social event detection has been actively studied, how to efficiently monitor evolving events from continuous tweet streams remains open and challenging. One common approach for event detection from text streams is to use single-pass incremental clustering. However, this approach does not track the evolution of events, nor does it address the issue of efficient monitoring in the presence of a large number of events. In this paper, we capture the dynamics of events using four event operations (create, absorb, split, and merge), which can be effectively used to monitor evolving events. Moreover, we propose a novel event indexing structure, called Multi-layer Inverted List (MIL), to manage dynamic event databases for the acceleration of large-scale event search and update. We thoroughly study the problem of nearest neighbour search using MIL based on upper bound pruning, along with incremental index maintenance. Extensive experiments have been conducted on a large-scale real-life tweet dataset. The results demonstrate the promising performance of our event indexing and monitoring methods on both efficiency and effectiveness.

[1]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[2]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[3]  Mohamed A. Sharaf,et al.  Emerging event detection in social networks with location sensitivity , 2014, World Wide Web.

[4]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[5]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[6]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[8]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[9]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[10]  Chung Keung Poon,et al.  Efficient Phrase Querying with Common Phrase Index , 2006, ECIR.

[11]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[12]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[13]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[14]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[15]  Zi Huang,et al.  Indexing evolving events from tweet streams , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[16]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[17]  Nagiza F. Samatova,et al.  Fast Matching for All Pairs Similarity Search , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[18]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[19]  Divesh Srivastava,et al.  Incremental Record Linkage , 2014, Proc. VLDB Endow..

[20]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[21]  Edith Cohen Decay Models , 2009, Encyclopedia of Database Systems.

[22]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[23]  Li Shang,et al.  ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[24]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[25]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[26]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[27]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[28]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[29]  Ana-Maria Popescu,et al.  Extracting events and event descriptions from Twitter , 2011, WWW.

[30]  Senjuti Basu Roy,et al.  ALIAS: Author Disambiguation in Microsoft Academic Search Engine Dataset , 2014, EDBT.

[31]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[32]  Timos K. Sellis,et al.  A combination of trie-trees and inverted files for the indexing of set-valued attributes , 2006, CIKM '06.

[33]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[34]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.