A Unified Particle-Optimization Framework for Scalable Bayesian Sampling

There has been recent interest in developing scalable Bayesian sampling methods such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD) for big-data analysis. A standard SG-MCMC algorithm simulates samples from a discrete-time Markov chain to approximate a target distribution, thus samples could be highly correlated, an undesired property for SG-MCMC. In contrary, SVGD directly optimizes a set of particles to approximate a target distribution, and thus is able to obtain good approximations with relatively much fewer samples. In this paper, we propose a principle particle-optimization framework based on Wasserstein gradient flows to unify SG-MCMC and SVGD, and to allow new algorithms to be developed. Our framework interprets SG-MCMC as particle optimization on the space of probability measures, revealing a strong connection between SG-MCMC and SVGD. The key component of our framework is several particle-approximate techniques to efficiently solve the original partial differential equations on the space of probability measures. Extensive experiments on both synthetic data and deep neural networks demonstrate the effectiveness and efficiency of our framework for scalable Bayesian sampling.

[1]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[2]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[3]  Andrea L. Bertozzi,et al.  A blob method for the aggregation equation , 2014, Math. Comput..

[4]  Katy Craig,et al.  THE EXPONENTIAL FORMULA FOR THE WASSERSTEIN METRIC , 2013 .

[5]  J. Rulla,et al.  Error analysis for implicit approximations to solutions to Cauchy problems , 1996 .

[6]  Lawrence Carin,et al.  Learning Structural Weight Uncertainty for Sequential Decision-Making , 2017, AISTATS.

[7]  B. Øksendal Stochastic Differential Equations , 1985 .

[8]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[9]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[10]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[11]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[12]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[13]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[14]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[15]  M. Girolami,et al.  Langevin diffusions and the Metropolis-adjusted Langevin algorithm , 2013, 1309.2983.

[16]  Lawrence Carin,et al.  Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.

[17]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[18]  J. Carrillo,et al.  A blob method for diffusion , 2017, Calculus of Variations and Partial Differential Equations.

[19]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[20]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[21]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[22]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[23]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[24]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[25]  C. Villani Optimal Transport: Old and New , 2008 .

[26]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[27]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[28]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[29]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[30]  Lawrence Carin,et al.  ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching , 2017, NIPS.

[31]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[32]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[33]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[34]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[37]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[38]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[39]  Arka P. Ghosh Backward and Forward Equations for Diffusion Processes , 2011 .