论文信息 - Distributed Stochastic Gradient MCMC

Distributed Stochastic Gradient MCMC

Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw mini-batches from their local pool of data for a flexible amount of time before jumping to or syncing with other chains. This greatly reduces communication overhead and allows adaptive load balancing. Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.

[1] Kathryn B. Laskey,et al. Population Markov Chain Monte Carlo , 2004, Machine Learning.

[2] Darren J. Wilkinson,et al. Parallel Bayesian Computation , 2005 .

[3] Max Welling,et al. Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[4] Christophe Andrieu,et al. A tutorial on adaptive MCMC , 2008, Stat. Comput..

[5] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[6] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[7] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[9] Alexander J. Smola,et al. Scalable inference in latent variable models , 2012, WSDM '12.

[10] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[11] Yee Whye Teh,et al. Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex , 2013, NIPS.

[12] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[13] Max Welling,et al. Distributed and Adaptive Darting Monte Carlo through Regenerations , 2013, AISTATS.

[14] Arnaud Doucet,et al. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.