Topic Evolution in a Stream of Documents

Abstract Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in nite document sequences assuming a xed vocabulary. In this study, we propose \Topic Monitor" for the monitoring and understanding of topic and vocabulary evolution over an in nite document sequence, i.e. a stream. We use Probabilistic Latent Semantic Analysis (PLSA) for topic modeling and propose new folding-in techniques for topic adaptation under an evolving vocabulary. We extract a series of models, on which we detect index-based topic threads as human-interpretable descriptions of topic evolution.

[1]  Philip S. Yu,et al.  A Framework for Clustering Massive Text and Categorical Data Streams , 2006, SDM.

[2]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[3]  Myra Spiliopoulou,et al.  Expanding the taxonomies of bibliographic archives with persistent long-term themes , 2006, SAC '06.

[4]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[5]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[6]  Rene Schult Comparing Clustering Algorithms and Their Influence on the Evolution of Labeled Clusters , 2007, DEXA.

[7]  Meng Chang Chen,et al.  Using Incremental PLSI for Threshold-Resilient Online Event Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Myra Spiliopoulou,et al.  Discovering Emerging Topics in Unlabelled Text Collections , 2006, ADBIS.

[10]  Max Welling,et al.  Deterministic Latent Variable Models and Their Pitfalls , 2008, SDM.

[11]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[12]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[15]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[16]  Jen-Tzung Chien,et al.  Adaptive Bayesian Latent Semantic Analysis , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.