Kernel Thinning

We introduce kernel thinning, a new procedure for compressing a distribution P more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n) time, kernel thinning compresses an n-point approximation to P into a √ n-point approximation with comparable worst-case integration error in the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is Od(n 1 2 √ log n) for compactly supported P and Od(n 1 2 √ (log n)d+1 log log n) for sub-exponential P on R. In contrast, an equal-sized i.i.d. sample from P suffers Ω(n− 14 ) integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform P on [0, 1] but apply to general distributions on R and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning.

[1]  W. Beckner Inequalities in Fourier analysis , 1975 .

[2]  Charu C. Aggarwal,et al.  Efficient Data Representation by Selecting Prototypes with Importance Weights , 2017, 2019 IEEE International Conference on Data Mining (ICDM).

[3]  Edward B. Saff,et al.  Low Complexity Methods For Discretizing Manifolds Via Riesz Energy Minimization , 2013, Found. Comput. Math..

[4]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[5]  Simo Särkkä,et al.  On the positivity and magnitudes of Bayesian quadrature weights , 2018, Statistics and Computing.

[6]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[7]  Shachar Lovett,et al.  The Gram-Schmidt walk: a cure for the Banaszczyk blues , 2017, STOC.

[8]  Manfred Liebmann,et al.  Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation , 2016, J. Comput. Phys..

[9]  B. Goodwin Oscillatory behavior in enzymatic control processes. , 1965, Advances in enzyme regulation.

[10]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[11]  Pierre Chainais,et al.  Kernel quadrature with DPPs , 2019, NeurIPS.

[12]  A. Tanskanen,et al.  A simplified local control model of calcium-induced calcium release in cardiac ventricular myocytes. , 2004, Biophysical journal.

[13]  Necdet Batır Bounds for the Gamma Function , 2017, 1705.06167.

[14]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[15]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[16]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[17]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[18]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[19]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[21]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[22]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[23]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[24]  Suresh Venkatasubramanian,et al.  Comparing distributions and shapes using the kernel distance , 2010, SoCG '11.

[25]  Roger Ellman,et al.  A Proof of , 2008 .

[26]  E. Novak,et al.  Tractability of Multivariate Problems Volume II: Standard Information for Functionals , 2010 .

[27]  Jeff M. Phillips,et al.  Near-Optimal Coresets of Kernel Density Estimates , 2018, Discrete & Computational Geometry.

[28]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[29]  Jon Cockayne,et al.  Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[30]  S. Saitoh Applications of the General Theory of Reproducing Kernels , 1999 .

[31]  Rui Tuo,et al.  Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs , 2017, Technometrics.

[32]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[33]  H. Minh,et al.  Some Properties of Gaussian Reproducing Kernel Hilbert Spaces and Their Implications for Function Approximation and Learning Theory , 2010 .

[34]  A. Hardy,et al.  Monte Carlo with determinantal point processes , 2016, The Annals of Applied Probability.

[35]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[36]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[37]  Ingo Steinwart,et al.  Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs , 2012 .

[38]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[39]  A. J. Lotka Elements of Physical Biology. , 1925, Nature.

[40]  Xingping Sun,et al.  Conditionally positive definite functions and their application to multivariate interpolations , 1993 .

[41]  Michael W. Mahoney,et al.  On Linear Convergence of Weighted Kernel Herding , 2019, ArXiv.

[42]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.

[43]  Liang Zhao,et al.  On the Inclusion Relation of Reproducing Kernel Hilbert Spaces , 2011, ArXiv.

[44]  Holger Wendland,et al.  Near-optimal data-independent point locations for radial basis function interpolation , 2005, Adv. Comput. Math..

[45]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[46]  Simo Särkkä,et al.  A Bayes-Sard Cubature Method , 2018, NeurIPS.

[47]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[48]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[49]  Mehtaab Sawhney,et al.  Discrepancy minimization via a self-balancing walk , 2020, STOC.

[50]  Bernard Haasdonk,et al.  Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[51]  Jeff M. Phillips,et al.  Є-Samples for Kernels , 2013, SODA.

[52]  Samira Samadi,et al.  Near-Optimal Herding , 2014, COLT.

[53]  Philippe Rigollet,et al.  A Statistical Perspective on Coreset Density Estimation , 2020, ArXiv.

[54]  Art B. Owen,et al.  Statistically Efficient Thinning of a Markov Chain Sampler , 2015, ArXiv.

[55]  Shayan Oveis Gharan,et al.  A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes , 2019, ICML.

[56]  Jeff M. Phillips,et al.  Improved Coresets for Kernel Density Estimates , 2017, SODA.

[57]  Pierre Chainais,et al.  Kernel interpolation with continuous volume sampling , 2020, ICML.

[58]  Michael A. Osborne,et al.  Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[59]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[60]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[61]  M. Urner Scattered Data Approximation , 2016 .

[62]  Edo Liberty,et al.  Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[63]  Wittawat Jitkrittum,et al.  Large sample analysis of the median heuristic , 2017, 1707.07269.

[64]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[65]  Tirthankar Dasgupta,et al.  Sequential Exploration of Complex Surfaces Using Minimum Energy Designs , 2015, Technometrics.

[66]  Lawrence Mitchell,et al.  Simulating Human Cardiac Electrophysiology on Clinical Time-Scales , 2011, Front. Physio..

[67]  S. Bochner Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse , 1933 .

[68]  Frank D. Wood,et al.  Super-Sampling with a Reservoir , 2016, UAI.

[69]  Wai Ming Tai,et al.  New Nearly-Optimal Coreset for Kernel Density Estimation , 2020, ArXiv.

[70]  Krikamol Muandet,et al.  Minimax Estimation of Kernel Mean Embeddings , 2016, J. Mach. Learn. Res..

[71]  Gernot Plank,et al.  Simulating ventricular systolic motion in a four-chamber heart model with spatially varying robin boundary conditions to model the effect of the pericardium , 2020, Journal of biomechanics.

[72]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[73]  Raaz Dwivedi,et al.  The power of online thinning in reducing discrepancy , 2016, Probability Theory and Related Fields.

[74]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[75]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[76]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[77]  J. Borwein,et al.  Uniform Bounds for the Complementary Incomplete Gamma Function , 2009 .

[78]  JOEL SPENCER,et al.  Balancing games , 1977, J. Comb. Theory B.

[79]  Fred J. Hickernell,et al.  A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[80]  V. Roshan Joseph,et al.  Support points , 2016, The Annals of Statistics.

[81]  D. Paulin Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[82]  Frances Y. Kuo,et al.  High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[83]  J. Dedecker,et al.  Subgaussian concentration inequalities for geometrically ergodic Markov chains , 2014, 1412.1794.

[84]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.