Discovering global and local bursts in a stream of news

Reports on major events like hurricanes and earthquakes, and major topics like the financial crisis or the Egyptian revolution appear in Internet news and become (ir)regularly updated, as new insights are acquired. Tracking emerging subtopics in a major or even local event is important for the news readers but challenging for the operator: subtopics may emerge gradually or in a bursty way; they may be of some importance inside the event, but too rare to be visible inside the whole stream of news. In this study, we propose a text stream clustering method that detects, tracks and updates large and small bursts of news in a two-level topic hierarchy. We report on our first results on a stream of news from February to April 2011.

[1]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[2]  C. Borgelt,et al.  Experiments in Document Clustering using Cluster Specific Term Weights , 1982 .

[3]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[4]  D. Stott Parker,et al.  Topic dynamics: an alternative model of bursts in streams of topics , 2010, KDD.

[5]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[6]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[7]  Philip S. Yu,et al.  A Framework for Clustering Massive Text and Categorical Data Streams , 2006, SDM.

[8]  Myra Spiliopoulou,et al.  Discovering Emerging Topics in Unlabelled Text Collections , 2006, ADBIS.

[9]  Jian Yin,et al.  Clustering Text Data Streams , 2008, Journal of Computer Science and Technology.

[10]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[11]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[12]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[13]  Li Shang,et al.  ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[14]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[16]  Xing Xie,et al.  IBM Research Report ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks , 2011 .