论文信息 - Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy - 字舞流文

Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy

Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. Here we consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce computational cost, we investigate a variant that applies this technique to a mini-batch of the candidate set at each iteration. When the candidate points are sampled from the target, the consistency of these new algorithm - and their mini-batch variants - is established. We demonstrate the algorithms on a range of important computational problems, including optimisation of nodes in Bayesian cubature and the thinning of Markov chain output.

Marina Riabiz | Jackson Gorham | Onur Teymur | Chris. J. Oates | C. Oates | Jackson Gorham | Onur Teymur | M. Riabiz

[1] Franccois-Xavier Briol,et al. Stein Point Markov Chain Monte Carlo , 2019, ICML.

[2] Toni Karvonen,et al. Kernel-Based and Bayesian Methods for Numerical Integration , 2019 .

[3] V. Roshan Joseph,et al. Support points , 2016, The Annals of Statistics.

[4] Lester W. Mackey,et al. Measuring Sample Quality with Stein's Method , 2015, NIPS.

[5] Martin Ehler,et al. Optimal Monte Carlo integration on closed manifolds , 2017, Statistics and Computing.

[6] Qiang Liu,et al. Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[7] David P. Williamson,et al. Improved approximation algorithms for MAX SAT , 2000, SODA '00.

[8] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[9] Kenji Fukumizu,et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[10] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[11] Lester W. Mackey,et al. Stein Points , 2018, ICML.

[12] K. Fukumizu,et al. Learning via Hilbert Space Embedding of Distributions , 2007 .

[13] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[14] Fred J. Hickernell,et al. A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[15] Huiling Le,et al. A diffusion approach to Stein's method on Riemannian manifolds , 2020, 2003.11497.

[16] R. Caflisch,et al. Quasi-Monte Carlo integration , 1995 .

[17] Jon Cockayne,et al. Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[18] Franz Rendl,et al. Semidefinite relaxations for partitioning, assignment and ordering problems , 2016, Ann. Oper. Res..

[19] David Duvenaud,et al. Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[20] A. Tanskanen,et al. A simplified local control model of calcium-induced calcium release in cardiac ventricular myocytes. , 2004, Biophysical journal.

[21] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[22] M. Girolami,et al. A Riemannian-Stein Kernel method , 2018 .

[23] Simo Särkkä,et al. Symmetry exploits for Bayesian cubature methods , 2018, Statistics and Computing.

[24] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[25] Wittawat Jitkrittum,et al. Large sample analysis of the median heuristic , 2017, 1707.07269.

[26] Alessandro Barp,et al. Statistical Inference for Generative Models with Maximum Mean Discrepancy , 2019, ArXiv.

[27] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[28] Arthur Gretton,et al. Maximum Mean Discrepancy Gradient Flow , 2019, NeurIPS.

[29] P. J. Green,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[30] Raif M. Rustamov,et al. Closed‐form expressions for maximum mean discrepancy with applications to Wasserstein auto‐encoders , 2019, Stat.

[31] Fredrik Lindsten,et al. Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[32] Alessandro Barp,et al. Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[33] F. Pillichshammer,et al. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[34] Francis R. Bach,et al. On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[35] Roman Garnett,et al. BINOCULARS for efficient, nonmyopic sequential experimental design , 2019, ICML.

[36] K. Chaloner,et al. Bayesian Experimental Design: A Review , 1995 .

[37] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[38] Michael A. Osborne,et al. Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[39] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[40] N. Chopin,et al. Control functionals for Monte Carlo integration , 2014, 1410.2392.

[41] Pierre Alquier,et al. MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy , 2019, AABI.

[42] Luc Pronzato,et al. Bayesian Quadrature, Energy Minimization, and Space-Filling Design , 2020, SIAM/ASA J. Uncertain. Quantification.

[43] S. Graf,et al. Foundations of Quantization for Probability Distributions , 2000 .

[44] Michael A. Osborne,et al. Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[45] Aki Vehtari,et al. Batch simulations and uncertainty quantification in Gaussian process surrogate approximate Bayesian computation , 2019, UAI.

[46] Takeru Matsuda,et al. A Stein Goodness-of-fit Test for Directional Distributions , 2020, AISTATS.

[47] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[48] F. M. Larkin. Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[49] Bernard Haasdonk,et al. Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[50] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[51] Alexander J. Smola,et al. Super-Samples from Kernel Herding , 2010, UAI.