Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy

Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. Here we consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce computational cost, we investigate a variant that applies this technique to a mini-batch of the candidate set at each iteration. When the candidate points are sampled from the target, the consistency of these new algorithm - and their mini-batch variants - is established. We demonstrate the algorithms on a range of important computational problems, including optimisation of nodes in Bayesian cubature and the thinning of Markov chain output.

[1]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[2]  Toni Karvonen,et al.  Kernel-Based and Bayesian Methods for Numerical Integration , 2019 .

[3]  V. Roshan Joseph,et al.  Support points , 2016, The Annals of Statistics.

[4]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[5]  Martin Ehler,et al.  Optimal Monte Carlo integration on closed manifolds , 2017, Statistics and Computing.

[6]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[7]  David P. Williamson,et al.  Improved approximation algorithms for MAX SAT , 2000, SODA '00.

[8]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[9]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[10]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[11]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[12]  K. Fukumizu,et al.  Learning via Hilbert Space Embedding of Distributions , 2007 .

[13]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[14]  Fred J. Hickernell,et al.  A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[15]  Huiling Le,et al.  A diffusion approach to Stein's method on Riemannian manifolds , 2020, 2003.11497.

[16]  R. Caflisch,et al.  Quasi-Monte Carlo integration , 1995 .

[17]  Jon Cockayne,et al.  Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[18]  Franz Rendl,et al.  Semidefinite relaxations for partitioning, assignment and ordering problems , 2016, Ann. Oper. Res..

[19]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[20]  A. Tanskanen,et al.  A simplified local control model of calcium-induced calcium release in cardiac ventricular myocytes. , 2004, Biophysical journal.

[21]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[22]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[23]  Simo Särkkä,et al.  Symmetry exploits for Bayesian cubature methods , 2018, Statistics and Computing.

[24]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[25]  Wittawat Jitkrittum,et al.  Large sample analysis of the median heuristic , 2017, 1707.07269.

[26]  Alessandro Barp,et al.  Statistical Inference for Generative Models with Maximum Mean Discrepancy , 2019, ArXiv.

[27]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[28]  Arthur Gretton,et al.  Maximum Mean Discrepancy Gradient Flow , 2019, NeurIPS.

[29]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[30]  Raif M. Rustamov,et al.  Closed‐form expressions for maximum mean discrepancy with applications to Wasserstein auto‐encoders , 2019, Stat.

[31]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[32]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[33]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[34]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[35]  Roman Garnett,et al.  BINOCULARS for efficient, nonmyopic sequential experimental design , 2019, ICML.

[36]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[37]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[38]  Michael A. Osborne,et al.  Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[39]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[40]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[41]  Pierre Alquier,et al.  MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy , 2019, AABI.

[42]  Luc Pronzato,et al.  Bayesian Quadrature, Energy Minimization, and Space-Filling Design , 2020, SIAM/ASA J. Uncertain. Quantification.

[43]  S. Graf,et al.  Foundations of Quantization for Probability Distributions , 2000 .

[44]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[45]  Aki Vehtari,et al.  Batch simulations and uncertainty quantification in Gaussian process surrogate approximate Bayesian computation , 2019, UAI.

[46]  Takeru Matsuda,et al.  A Stein Goodness-of-fit Test for Directional Distributions , 2020, AISTATS.

[47]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[48]  F. M. Larkin Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[49]  Bernard Haasdonk,et al.  Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[50]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[51]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.