Online Learning for Latent Dirichlet Allocation

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[3]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[4]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[5]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[7]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[8]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[9]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[10]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[13]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[18]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[19]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[20]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[21]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[22]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[23]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[24]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[25]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[26]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[27]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..