论文信息 - Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling

Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling

Markov chain Monte Carlo (MCMC) approximates the posterior distribution of latent variable models by generating many samples and averaging over them. In practice, however, it is often more convenient to cut corners, using only a single sample or following a suboptimal averaging strategy. We systematically study different strategies for averaging MCMC samples and show empirically that averaging properly leads to significant improvements in prediction.

Philip Resnik | Viet-An Nguyen | Jordan Boyd-Graber

[1] Philip Resnik,et al. Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[2] Michael I. Jordan,et al. DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[3] David M. Blei,et al. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models , 2014 .

[4] Hanna Wallach,et al. Structured Topic Models for Language , 2008 .

[5] Susan T. Dumais,et al. Partially labeled topic models for interpretable text mining , 2011, KDD.

[6] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[7] Philip Resnik,et al. GIBBS SAMPLING FOR THE UNINITIATED , 2010 .

[8] Chong Wang,et al. Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[10] Eric P. Xing,et al. MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[11] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .