论文信息 - Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms - 字舞流文

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles. As a noticeable example, Stein variational gradient descent (SVGD) provides a deterministic and computationally efficient update, but it is known to underestimate the variance in high dimensions, the mechanism of which is poorly understood. In this work we explore a connection between SVGD and MMD-based inference algorithm via Stein’s lemma. By comparing the two update rules, we identify the source of bias in SVGD as a combination of high variance and deterministic bias, and empirically demonstrate that the removal of either factors leads to accurate estimation. In addition, for learning high-dimensional Gaussian target, we analytically derive the converged variance for both algorithms, and confirm that only SVGD suffers from the curse of dimensionality.

Jimmy Ba | Murat A. Erdogdu | Marzyeh Ghassemi | Denny Wu | Shengyang Sun | Tianzong Zhang | Jimmy Ba | M. Ghassemi | Denny Wu | Shengyang Sun | Tianzong Zhang

[1] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[2] A. Barbour. Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[3] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[4] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Max Welling,et al. Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[7] Noureddine El Karoui,et al. The spectrum of kernel random matrices , 2010, 1001.0492.

[8] Kenji Fukumizu,et al. Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[9] David Duvenaud,et al. Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[10] Xiuyuan Cheng,et al. THE SPECTRUM OF RANDOM INNER-PRODUCT KERNEL MATRICES , 2012, 1202.3155.

[11] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[12] Francis R. Bach,et al. On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[13] C. Bordenave. On Euclidean random matrices in high dimension , 2012, 1209.5888.

[14] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[15] Lester W. Mackey,et al. Measuring Sample Quality with Stein's Method , 2015, NIPS.

[16] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[17] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[18] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[19] Murat A. Erdogdu. Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma , 2015, J. Mach. Learn. Res..

[20] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.

[21] Taiji Suzuki,et al. Stochastic Particle Gradient Descent for Infinite Ensembles , 2017, ArXiv.

[22] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[23] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.

[24] Ferenc Huszár,et al. Variational Inference using Implicit Distributions , 2017, ArXiv.

[25] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[26] Richard E. Turner,et al. Gradient Estimators for Implicit Models , 2017, ICLR.

[27] Qiang Liu,et al. Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[28] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[29] Dilin Wang,et al. Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[30] Jun Zhu,et al. A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[31] Ning Chen,et al. Message Passing Stein Variational Gradient Descent , 2017, ICML.

[32] Murat A. Erdogdu,et al. Scalable Approximations for Generalized Linear Problems , 2016, J. Mach. Learn. Res..

[33] M. Girolami,et al. Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[34] Jian Peng,et al. Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization , 2019, ICML.

[35] Jianfeng Lu,et al. Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[36] Dilin Wang,et al. Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models , 2019, ICML.