Burst prediction from Weibo: A crowd-sensing and tweet-centric method

Online social media such as Weibo generates copious and to-the-minute information about real-world events of all kinds. How to effectively and efficiently detect emergent events from massive tweet streams is now drawing attention from various sources of every interest. Despite the wealth of previous research work, bursty topic detection (or burst prediction) remains a huge challenge because of certain qualities in the data such as sparseness of useable information and an enormous amount of noise in each data set. In particular, the conventional term-frequency-based approaches may not be appropriate in this context, since the propagation of Weibo events is typically driven by a few influential posts, which are often buried among irrelevant noisy tweets with frequent but trivial terms. Other commonly used methods, such as topic models, may also fail to detect bursty events due to the high sparseness of tweets and high computational complexity. In light of this, this paper proposes a crowd-sensing, tweet-centric method for burst prediction. That is, we first select influential users on Weibo as social sensors to perceive bursty events via their posts or reposts. This is indeed crucial for excluding the interference of unimportant tweets and reducing computational costs. All the tweets are then subject to the filtering of uninteresting topics and the remainder is then monitored for possible bursts according to the modeling of the time-series of retweeted occurrences. Extensive experiments on real-life huge Weibo datasets show both the efficiency and effectiveness of our approach. In particular, the tweet-centric philosophy of our method provides rich semantics to the detected bursts and thus is of great value in practice.

[1]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[2]  J. Allan,et al.  On-Line New Event Detection using Single Pass Clustering , 1998 .

[3]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[4]  Shi Zhong,et al.  Efficient streaming text clustering , 2005, Neural Networks.

[5]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[6]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[7]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[8]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR Forum.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[11]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[12]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[15]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[16]  Charu C. Aggarwal,et al.  Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .

[17]  Michelle R. Guy,et al.  Twitter earthquake detection: earthquake monitoring in a social world , 2012 .