Particle Filter Rejuvenation and Latent Dirichlet Allocation

Previous research has established several methods of online learning for latent Dirichlet allocation (LDA). However, streaming learning for LDA— allowing only one pass over the data and constant storage complexity—is not as well explored. We use reservoir sampling to reduce the storage complexity of a previously-studied online algorithm, namely the particle filter, to constant. We then show that a simpler particle filter implementation performs just as well, and that the quality of the initialization dominates other factors of performance.

[1]  Alexander J. Smola,et al.  Unified analysis of streaming news , 2011, WWW.

[2]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[3]  David M. Blei,et al.  Sparse stochastic inference for latent Dirichlet allocation , 2012, ICML.

[4]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[5]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[6]  Jason Baldridge,et al.  A recursive estimate for the predictive likelihood in a topic model , 2013, AISTATS.

[7]  Mark Johnson,et al.  Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation , 2012, ACL.

[8]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[9]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[11]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[12]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[13]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[14]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[15]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[16]  Mark Johnson,et al.  A Particle Filter algorithm for Bayesian Wordsegmentation , 2011, ALTA.

[17]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[18]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[19]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[20]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[21]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[22]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.