Online multiscale dynamic topic models

We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long-timescale dependency as well as the short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents with timestamps.

[1]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[2]  M. Stephens Dealing with label switching in mixture models , 2000 .

[3]  Thomas Hofmann,et al.  Collaborative filtering via gaussian probabilistic latent semantic analysis , 2003, SIGIR.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[8]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[9]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[10]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[13]  Ramesh Nallapati,et al.  Multiscale topic tomography , 2007, KDD '07.

[14]  Jimeng Sun,et al.  Dynamic Mixture Models for Multiple Time-Series , 2007, IJCAI.

[15]  Naonori Ueda,et al.  Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[16]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[17]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[19]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[20]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[21]  Naonori Ueda,et al.  Topic Tracking Model for Analyzing Consumer Purchase Behavior , 2009, IJCAI.

[22]  T. Minka Estimating a Dirichlet distribution , 2012 .