论文信息 - Online Learning for Latent Dirichlet Allocation - 字舞流文

Online Learning for Latent Dirichlet Allocation

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.

Francis R. Bach | David M. Blei | Matthew D. Hoffman | F. Bach | D. Blei | M. Hoffman

[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2] Michael A. Arbib,et al. The handbook of brain theory and neural networks , 1995, A Bradford book.

[3] Daniel A. Keim,et al. On Knowledge Discovery and Data Mining , 1997 .

[4] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .

[5] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .

[7] Hagai Attias,et al. A Variational Bayesian Framework for Graphical Models , 1999 .

[8] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[9] Masa-aki Sato,et al. Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[10] Wray L. Buntine. Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[11] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12] Michael I. Jordan,et al. Variational methods for the Dirichlet process , 2004, ICML.

[13] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15] Ching-Yung Lin,et al. Modeling and predicting personal information dissemination behavior , 2005, KDD '05.

[16] H. Robbins. A Stochastic Approximation Method , 1951 .

[17] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[18] Max Welling,et al. Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[19] Dan Klein,et al. Online EM for Unsupervised Models , 2009, NAACL.

[20] Yee Whye Teh,et al. On Smoothing and Inference for Topic Models , 2009, UAI.

[21] Feng Yan,et al. Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[22] Thomas L. Griffiths,et al. Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[23] Andrew McCallum,et al. Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[24] Andrew McCallum,et al. Rethinking LDA: Why Priors Matter , 2009, NIPS.

[25] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[26] Jon D. McAuliffe,et al. Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[27] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..