Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory

Particle-optimization-based sampling (POS) is a recently developed effective sampling technique that interactively updates a set of particles. A representative algorithm is the Stein variational gradient descent (SVGD). We prove, under certain conditions, SVGD experiences a theoretical pitfall, {\it i.e.}, particles tend to collapse. As a remedy, we generalize POS to a stochastic setting by injecting random noise into particle updates, thus yielding particle-optimization sampling (SPOS). Notably, for the first time, we develop {\em non-asymptotic convergence theory} for the SPOS framework (related to SVGD), characterizing algorithm convergence in terms of the 1-Wasserstein distance w.r.t.\! the numbers of particles and iterations. Somewhat surprisingly, with the same number of updates (not too large) for each particle, our theory suggests adopting more particles does not necessarily lead to a better approximation of a target distribution, due to limited computational budget and numerical errors. This phenomenon is also observed in SVGD and verified via an experiment on synthetic data. Extensive experimental results verify our theory and demonstrate the effectiveness of our proposed framework.

[1]  Michael I. Jordan,et al.  On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[2]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[3]  Lawrence Carin,et al.  Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.

[4]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[5]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[6]  Antoine Liutkus,et al.  Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions , 2018, ICML.

[7]  Alain Durmus,et al.  An elementary approach to uniform in time propagation of chaos , 2018, Proceedings of the American Mathematical Society.

[8]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[9]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[10]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[11]  Lawrence Carin,et al.  Learning Structural Weight Uncertainty for Sequential Decision-Making , 2017, AISTATS.

[12]  J. Carrillo,et al.  A blob method for diffusion , 2017, Calculus of Variations and Partial Differential Equations.

[13]  Jonathan C. Mattingly,et al.  Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .

[14]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[15]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[16]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[17]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[18]  Dilin Wang,et al.  Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[19]  Dilin Wang,et al.  Structured Stein Variational Inference for Continuous Graphical Models , 2017, ArXiv.

[20]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[21]  Lawrence Carin,et al.  Scalable Thompson Sampling via Optimal Transport , 2019, AISTATS.

[22]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[23]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[24]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[25]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[26]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[27]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[28]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[29]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[30]  F. Malrieu Convergence to equilibrium for granular media equations and their Euler schemes , 2003 .

[31]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[32]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[33]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[34]  Qiang Liu,et al.  Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[35]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[36]  C. Villani Optimal Transport: Old and New , 2008 .

[37]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[38]  Saeed Ghadimi,et al.  Non-asymptotic Results for Langevin Monte Carlo: Coordinate-wise and Black-box Sampling , 2019, 1902.01373.

[39]  Lawrence Carin,et al.  Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.

[40]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[41]  P. Cattiaux,et al.  Probabilistic approach for granular media equations in the non-uniformly convex case , 2006, math/0603541.

[42]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[43]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[44]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[45]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[46]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[47]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[48]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[49]  Jianfeng Lu,et al.  Scaling limit of the Stein variational gradient descent part I: the mean field regime , 2018 .

[50]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[51]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.