Topics over time: a non-Markov continuous-time model of topical trends

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

[1]  P. Kumaraswamy A generalized probability density function for double-bounded random processes , 1980 .

[2]  David Jensen,et al.  TimeMines: Constructing Timelines with Statistical Models of Word Usage , 2000, KDD 2000.

[3]  Daphne Koller,et al.  Continuous Time Bayesian Networks , 2012, UAI.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[6]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[7]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[8]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[9]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[12]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[13]  Andrew McCallum,et al.  A Note on Topical N-grams , 2005 .

[14]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[15]  Andrew McCallum,et al.  Group and topic discovery from relations and text , 2005, LinkKDD '05.

[16]  David Kauchak,et al.  Modeling word burstiness using the Dirichlet distribution , 2005, ICML.

[17]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[18]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .