A Refined Method for Detecting Interpretable and Real-Time Bursty Topic in Microblog Stream

The real-time detection of bursty topics on microblog has acquired much research efforts in recent years, due to its wide use in a range of user-focused tasks such as information recommendation, trend analysis, and document search. Most existing methods can achieve good performance on real-time detection, but unfortunately, lack of much consideration on topic coherence and topic granularity for better semantic interpretability, which often results in odd topics hard to be interpreted. Therefore, it demands much more efforts on evaluation and improvement of the intrinsic quality of detected topics at their very early stages. In this paper, we propose a refined tensor decomposition model to effectively detect bursty topics, and at the same time, evaluate topic coherence and provide informative bursty topics with different burst levels. We evaluated our method over 7 million microblog stream. The experiment results demonstrate both efficiency in topic detection and effectiveness in topic interpretability. Specifically, our method on a single machine can consistently handle millions of microblogs per day and present ranked interpretable topics with different burst levels.

[1]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[2]  Timothy Baldwin,et al.  Best Topic Word Selection for Topic Labelling , 2010, COLING.

[3]  Seungmin Rho,et al.  TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme , 2013, Multimedia Systems.

[4]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[5]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[6]  Chen Lin,et al.  CLEar: A Real-time Online Observatory for Bursty and Viral Events , 2014, Proc. VLDB Endow..

[7]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[8]  Suman Nath,et al.  GeoTrend: spatial trending queries on real-time microblogs , 2016, SIGSPATIAL/GIS.

[9]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[10]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[11]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[12]  Hans-Peter Kriegel,et al.  SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams , 2016, SSDBM.

[13]  D. Stott Parker,et al.  Topic dynamics: an alternative model of bursts in streams of topics , 2010, KDD.

[14]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[15]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Timothy Baldwin,et al.  The Sensitivity of Topic Coherence Evaluation to Topic Cardinality , 2016, NAACL.

[18]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[19]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[20]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[21]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.