论文信息 - Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

In this paper we address the following question: “Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?”. An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal approximation of the posterior, while for slow mixing rates it will mimic the behavior of SGLD with a pre-conditioner matrix. As a bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic gradients) and as such an efficient optimizer during burn-in.

Ahn | J. Pineau

[1] L. L. Cam,et al. Asymptotic Methods In Statistical Decision Theory , 1986 .

[2] V. Borkar. Stochastic approximation with two time scales , 1997 .

[3] W. A. Scott. Maximum likelihood estimation using the empirical fisher information matrix , 2002 .

[4] M. Seeger. Low Rank Updates for the Cholesky Decomposition , 2004 .

[5] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[7] Christophe Andrieu,et al. A tutorial on adaptive MCMC , 2008, Stat. Comput..

[8] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[9] M. Girolami. Riemann Manifold Langevin and Hamiltonian Monte Carlo , 2010 .

[10] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[11] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.