Scalable Bayesian Inference via Particle Mirror Descent

Bayesian methods are appealing in their flexibility in modeling complex data and their ability to capture uncertainty in parameters. However, when Bayes’ rule does not result in closed-form, most approximate Bayesian inference algorithms lacks either scalability or rigorous guarantees. To tackle this challenge, we propose a scalable yet simple algorithm, Particle Mirror Descent (PMD), to iteratively approximate the posterior density. PMD is inspired by stochastic functional mirror descent where one descends in the density space using a small batch of data points at each iteration, and by particle filtering where one uses samples to approximate a function. We prove result of the first kind that, after T iterations, PMD provides a posterior density estimator that converges in terms of KL-divergence to the true posterior in rate O(1/ √ T ). We show that PMD is competitive to several scalable Bayesian algorithms in mixture models, Bayesian logistic regression, sparse Gaussian processes and latent Dirichlet allocation. 1 ar X iv :1 50 6. 03 10 1v 1 [ cs .L G ] 9 J un 2 01 5

[1]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[2]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[3]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[4]  David M. Blei,et al.  Sparse stochastic inference for latent Dirichlet allocation , 2012, ICML.

[5]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[6]  D. Madigan,et al.  A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets , 2006 .

[7]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[9]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[10]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[11]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[12]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[13]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[14]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[15]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[16]  N. Chopin A sequential particle filter method for static models , 2002 .

[17]  A. Zellner Optimal Information Processing and Bayes's Theorem , 1988 .

[18]  A. Juditsky,et al.  On Minimax Wavelet Estimators , 1996 .

[19]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[20]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[21]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[22]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[23]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[24]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[25]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[26]  K. Zygalakis,et al.  (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.

[27]  Yee Whye Teh,et al.  Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex , 2013, NIPS.

[28]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[29]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[30]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[31]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[32]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[33]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[34]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[35]  Arnaud Doucet,et al.  A survey of convergence results on particle filtering methods for practitioners , 2002, IEEE Trans. Signal Process..