A unified model for stable and temporal topic detection from social media data

Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[3]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[4]  D. Stott Parker,et al.  Topic dynamics: an alternative model of bursts in streams of topics , 2010, KDD.

[5]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[6]  Junjie Yao,et al.  Evolutionary taxonomy construction from dynamic tag space , 2010, World Wide Web.

[7]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[8]  Junjie Yao,et al.  Detecting bursty events in collaborative tagging systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[10]  Brian D. Davison,et al.  Tracking trends: incorporating term volume into temporal topic models , 2011, KDD.

[11]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[12]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[13]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[14]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[15]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[16]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[17]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[19]  Junjie Yao,et al.  Temporal and Social Context Based Burst Detection from Folksonomies , 2010, AAAI.

[20]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.

[21]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[22]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[23]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[24]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[25]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[26]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.