Bursty event detection from microblog: a distributed and incremental approach

As a new form of social media, microblogs (e.g., Twitter and Weibo) are playing an important role in people's daily life. With the rise in popularity and size of microblogs, there is a need for distributed approaches that can detect bursty event with low latency from the short‐text data stream. In this paper, we propose a distributed and incremental temporal topic model for microblogs called Bursty Event dEtection (BEE+). BEE+ is able to detect bursty events from short‐text dataset and model the temporal information. And BEE+ processes the post‐stream incrementally to track the topic drifting of events over time. Therefore, the latent semantic indices are preserved from one time period to the next. In order to achieve real‐time processing, we design a distributed execution framework based on Spark engine. To verify its ability to detect bursty event, we conduct experiments on a Weibo dataset of 6,360,125 posts. The results show that BEE+ can outperform the baselines for detecting the meaningful bursty events and track the topic drifting. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Zhoujun Li,et al.  Emerging topic detection for organizations from microblogs , 2013, SIGIR.

[2]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[7]  Qing He,et al.  Parallel Implementation of Apriori Algorithm Based on MapReduce , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[8]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[9]  Jianxin Li,et al.  Online Bursty Event Detection from Microblog , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[10]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[11]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[12]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[13]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[14]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[15]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[18]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[19]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[20]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[21]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[22]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[23]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[24]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.