Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream

Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome.

[1]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[4]  J. Geweke,et al.  Bayesian estimation of state-space models using the Metropolis-Hastings algorithm within Gibbs sampling , 2001 .

[5]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[6]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[7]  Arnaud Doucet,et al.  Generalized Polya Urn for Time-varying Dirichlet Process Mixtures , 2007, UAI.

[8]  Peter D. Hoff,et al.  Nonparametric Modeling of Hierarchically Exchangeable Data , 2003 .

[9]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[10]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[11]  Hisashi Tanizaki Nonlinear and Non-Gaussian State-Space Modeling with Monte Carlo Techniques : A Survey and Comparative Study , 2000 .

[12]  J. Lafferty,et al.  Time-Sensitive Dirichlet Process Mixture Models , 2005 .

[13]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[14]  Hisashi Tanizaki,et al.  Ch. 22. Nonlinear and non-gaussian state-space modeling with monte carlo techniques: A survey and comparative study , 2003 .

[15]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[16]  S. Roweis,et al.  Time-Varying Topic Models using Dependent Dirichlet Processes , 2005 .

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[19]  Thomas P. Minka,et al.  From Hidden Markov Models to Linear Dynamical Systems , 1999 .

[20]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[21]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.