Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles. As a noticeable example, Stein variational gradient descent (SVGD) provides a deterministic and computationally efficient update, but it is known to underestimate the variance in high dimensions, the mechanism of which is poorly understood. In this work we explore a connection between SVGD and MMD-based inference algorithm via Stein’s lemma. By comparing the two update rules, we identify the source of bias in SVGD as a combination of high variance and deterministic bias, and empirically demonstrate that the removal of either factors leads to accurate estimation. In addition, for learning high-dimensional Gaussian target, we analytically derive the converged variance for both algorithms, and confirm that only SVGD suffers from the curse of dimensionality.

[1]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[2]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[3]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[4]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Max Welling,et al.  Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[7]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.

[8]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[9]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[10]  Xiuyuan Cheng,et al.  THE SPECTRUM OF RANDOM INNER-PRODUCT KERNEL MATRICES , 2012, 1202.3155.

[11]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[12]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[13]  C. Bordenave On Euclidean random matrices in high dimension , 2012, 1209.5888.

[14]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[15]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[16]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[17]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[18]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[19]  Murat A. Erdogdu Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma , 2015, J. Mach. Learn. Res..

[20]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[21]  Taiji Suzuki,et al.  Stochastic Particle Gradient Descent for Infinite Ensembles , 2017, ArXiv.

[22]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[23]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[24]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[25]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[26]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[27]  Qiang Liu,et al.  Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[28]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[29]  Dilin Wang,et al.  Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[30]  Jun Zhu,et al.  A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[31]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[32]  Murat A. Erdogdu,et al.  Scalable Approximations for Generalized Linear Problems , 2016, J. Mach. Learn. Res..

[33]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[34]  Jian Peng,et al.  Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization , 2019, ICML.

[35]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[36]  Dilin Wang,et al.  Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models , 2019, ICML.