论文信息 - Stein Point Markov Chain Monte Carlo - 字舞流文

Stein Point Markov Chain Monte Carlo

An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established.

Franccois-Xavier Briol | Chris. J. Oates | Alessandro Barp | Mark Girolami | Lester Mackey | Jackson Gorham | Wilson Ye Chen | M. Girolami | Lester W. Mackey | C. Oates | Jackson Gorham | A. Barp | François-Xavier Briol | W. Chen | François‐Xavier Briol

[1] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[2] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[3] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[4] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[5] Lester W. Mackey,et al. Measuring Sample Quality with Stein's Method , 2015, NIPS.

[6] F. Pillichshammer,et al. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[7] Thorsten Gerber,et al. Handbook Of Mathematical Functions , 2016 .

[8] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[9] Mark A. Girolami,et al. Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[10] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.

[11] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[12] Bai Li,et al. A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[13] N. Chopin,et al. Control functionals for Monte Carlo integration , 2014, 1410.2392.

[14] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[15] Ning Chen,et al. Message Passing Stein Variational Gradient Descent , 2017, ICML.

[16] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[17] Kenji Fukumizu,et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[18] Masashi Sugiyama,et al. Frank-Wolfe Stein Sampling , 2018, ArXiv.

[19] Bernard Haasdonk,et al. Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[20] Jianfeng Lu,et al. Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[21] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[22] Lei Li,et al. A stochastic version of Stein Variational Gradient Descent for efficient sampling , 2019, Communications in Applied Mathematics and Computational Science.

[23] S. Graf,et al. Foundations of Quantization for Probability Distributions , 2000 .

[24] Kenji Fukumizu,et al. A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[25] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[26] A. Gyles. Asset Price Dynamics, Volatility, and Prediction , 2007 .

[27] Lawrence Carin,et al. Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.

[28] V. Roshan Joseph,et al. Support points , 2016, The Annals of Statistics.

[29] Paul Grigas,et al. An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion , 2015, SIAM J. Optim..

[30] M. Girolami,et al. A Riemannian-Stein Kernel method , 2018 .

[31] Tirthankar Dasgupta,et al. Sequential Exploration of Complex Surfaces Using Minimum Energy Designs , 2015, Technometrics.

[32] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[33] Qiang Liu,et al. Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[34] Holger Wendland,et al. Near-optimal data-independent point locations for radial basis function interpolation , 2005, Adv. Comput. Math..

[35] Lester Mackey,et al. Random Feature Stein Discrepancies , 2018, NeurIPS.

[36] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[37] Rui Tuo,et al. Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs , 2017, Technometrics.

[38] Chang Liu,et al. Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[39] Mark Girolami,et al. The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[40] Matthew Parno,et al. Transport maps for accelerated Bayesian computation , 2015 .

[41] Lester W. Mackey,et al. Stein Points , 2018, ICML.

[42] Francis R. Bach,et al. On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[43] Guang Cheng,et al. Stein Neural Sampler , 2018, ArXiv.

[44] Qiang Liu,et al. Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[45] B. Goodwin. Oscillatory behavior in enzymatic control processes. , 1965, Advances in enzyme regulation.

[46] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[47] Mark A. Girolami,et al. Geometry and Dynamics for Markov Chain Monte Carlo , 2017, ArXiv.

[48] Dilin Wang,et al. Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[49] Masashi Sugiyama,et al. Bayesian Posterior Approximation via Greedy Particle Optimization , 2018, AAAI.

[50] Alexander J. Smola,et al. Super-Samples from Kernel Herding , 2010, UAI.

[51] J. Rosenthal,et al. Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[52] Tomaso A. Poggio,et al. Approximate inference with Wasserstein gradient flows , 2018, AISTATS.

[53] Fredrik Lindsten,et al. Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[54] G. Székely,et al. TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[55] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[56] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[57] Tiangang Cui,et al. A Stein variational Newton method , 2018, NeurIPS.