Stein Point Markov Chain Monte Carlo

An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established.

[1]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[2]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[3]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[4]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[5]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[6]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[7]  Thorsten Gerber,et al.  Handbook Of Mathematical Functions , 2016 .

[8]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[9]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[10]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[11]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[12]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[13]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[14]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[15]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[16]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[17]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[18]  Masashi Sugiyama,et al.  Frank-Wolfe Stein Sampling , 2018, ArXiv.

[19]  Bernard Haasdonk,et al.  Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[20]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[21]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[22]  Lei Li,et al.  A stochastic version of Stein Variational Gradient Descent for efficient sampling , 2019, Communications in Applied Mathematics and Computational Science.

[23]  S. Graf,et al.  Foundations of Quantization for Probability Distributions , 2000 .

[24]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[25]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[26]  A. Gyles Asset Price Dynamics, Volatility, and Prediction , 2007 .

[27]  Lawrence Carin,et al.  Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.

[28]  V. Roshan Joseph,et al.  Support points , 2016, The Annals of Statistics.

[29]  Paul Grigas,et al.  An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion , 2015, SIAM J. Optim..

[30]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[31]  Tirthankar Dasgupta,et al.  Sequential Exploration of Complex Surfaces Using Minimum Energy Designs , 2015, Technometrics.

[32]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[33]  Qiang Liu,et al.  Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[34]  Holger Wendland,et al.  Near-optimal data-independent point locations for radial basis function interpolation , 2005, Adv. Comput. Math..

[35]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[36]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[37]  Rui Tuo,et al.  Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs , 2017, Technometrics.

[38]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[39]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[40]  Matthew Parno,et al.  Transport maps for accelerated Bayesian computation , 2015 .

[41]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[42]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[43]  Guang Cheng,et al.  Stein Neural Sampler , 2018, ArXiv.

[44]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[45]  B. Goodwin Oscillatory behavior in enzymatic control processes. , 1965, Advances in enzyme regulation.

[46]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[47]  Mark A. Girolami,et al.  Geometry and Dynamics for Markov Chain Monte Carlo , 2017, ArXiv.

[48]  Dilin Wang,et al.  Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[49]  Masashi Sugiyama,et al.  Bayesian Posterior Approximation via Greedy Particle Optimization , 2018, AAAI.

[50]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[51]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[52]  Tomaso A. Poggio,et al.  Approximate inference with Wasserstein gradient flows , 2018, AISTATS.

[53]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[54]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[55]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[56]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[57]  Tiangang Cui,et al.  A Stein variational Newton method , 2018, NeurIPS.