Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientific literature. In this paper, we study a particular TTM task -- discovering and summarizing the evolutionary patterns of themes in a text stream. We define this new text mining problem and present general probabilistic methods for solving this problem through (1) discovering latent themes from text; (2) constructing an evolution graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods on two different domains (i.e., news articles and literature) shows that the proposed methods can discover interesting evolutionary theme patterns effectively.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Wray L. Buntine,et al.  Exploring Independent Trends in a Topic-Based Search Engine , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[3]  Ravi Kumar,et al.  A graph-theoretic approach to extract storylines from search results , 2004, KDD.

[4]  William M. Pottenger,et al.  Methodologies for Trend Detection in Textual Data Mining , 2005 .

[5]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[6]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[7]  Stanley Boykin,et al.  Machine learning of event segmentation for news on demand , 2000, CACM.

[8]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  William M. Pottenger,et al.  A Survey of Emerging Trend Detection in Textual Data Mining , 2004 .

[11]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[12]  Ah-Hwee Tan,et al.  Topic Detection, Tracking, and Trend Analysis Using Self-Organizing Neural Networks , 2001, PAKDD.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[17]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[18]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.