论文信息 - Kernel Thinning - 字舞流文

Kernel Thinning

We introduce kernel thinning, a new procedure for compressing a distribution P more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n) time, kernel thinning compresses an n-point approximation to P into a √ n-point approximation with comparable worst-case integration error in the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is Od(n 1 2 √ log n) for compactly supported P and Od(n 1 2 √ (log n)d+1 log log n) for sub-exponential P on R. In contrast, an equal-sized i.i.d. sample from P suffers Ω(n− 14 ) integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform P on [0, 1] but apply to general distributions on R and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning.

Lester Mackey | Raaz Dwivedi

[1] W. Beckner. Inequalities in Fourier analysis , 1975 .

[2] Charu C. Aggarwal,et al. Efficient Data Representation by Selecting Prototypes with Importance Weights , 2017, 2019 IEEE International Conference on Data Mining (ICDM).

[3] Edward B. Saff,et al. Low Complexity Methods For Discretizing Manifolds Via Riesz Energy Minimization , 2013, Found. Comput. Math..

[4] A. O'Hagan,et al. Bayes–Hermite quadrature , 1991 .

[5] Simo Särkkä,et al. On the positivity and magnitudes of Bayesian quadrature weights , 2018, Statistics and Computing.

[6] N. Chopin,et al. Control functionals for Monte Carlo integration , 2014, 1410.2392.

[7] Shachar Lovett,et al. The Gram-Schmidt walk: a cure for the Banaszczyk blues , 2017, STOC.

[8] Manfred Liebmann,et al. Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation , 2016, J. Comput. Phys..

[9] B. Goodwin. Oscillatory behavior in enzymatic control processes. , 1965, Advances in enzyme regulation.

[10] Trevor Campbell,et al. Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[11] Pierre Chainais,et al. Kernel quadrature with DPPs , 2019, NeurIPS.

[12] A. Tanskanen,et al. A simplified local control model of calcium-induced calcium release in cardiac ventricular myocytes. , 2004, Biophysical journal.

[13] Necdet Batır. Bounds for the Gamma Function , 2017, 1705.06167.

[14] Fredrik Lindsten,et al. Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[15] Franccois-Xavier Briol,et al. Stein Point Markov Chain Monte Carlo , 2019, ICML.

[16] V. Koltchinskii,et al. Random matrix approximation of spectra of integral operators , 2000 .

[17] M. Girolami,et al. Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[18] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[19] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[21] J. Rosenthal,et al. General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[22] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[23] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .

[24] Suresh Venkatasubramanian,et al. Comparing distributions and shapes using the kernel distance , 2010, SoCG '11.

[25] Roger Ellman,et al. A Proof of , 2008 .

[26] E. Novak,et al. Tractability of Multivariate Problems Volume II: Standard Information for Functionals , 2010 .

[27] Jeff M. Phillips,et al. Near-Optimal Coresets of Kernel Density Estimates , 2018, Discrete & Computational Geometry.

[28] Qiang Liu,et al. Black-box Importance Sampling , 2016, AISTATS.

[29] Jon Cockayne,et al. Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[30] S. Saitoh. Applications of the General Theory of Reproducing Kernels , 1999 .

[31] Rui Tuo,et al. Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs , 2017, Technometrics.

[32] Heikki Haario,et al. Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[33] H. Minh,et al. Some Properties of Gaussian Reproducing Kernel Hilbert Spaces and Their Implications for Function Approximation and Learning Theory , 2010 .

[34] A. Hardy,et al. Monte Carlo with determinantal point processes , 2016, The Annals of Applied Probability.

[35] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[36] Lester Mackey,et al. Random Feature Stein Discrepancies , 2018, NeurIPS.

[37] Ingo Steinwart,et al. Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs , 2012 .

[38] Lester W. Mackey,et al. Stein Points , 2018, ICML.

[39] A. J. Lotka. Elements of Physical Biology. , 1925, Nature.

[40] Xingping Sun,et al. Conditionally positive definite functions and their application to multivariate interpolations , 1993 .

[41] Michael W. Mahoney,et al. On Linear Convergence of Weighted Kernel Herding , 2019, ArXiv.

[42] Noureddine El Karoui,et al. The spectrum of kernel random matrices , 2010, 1001.0492.

[43] Liang Zhao,et al. On the Inclusion Relation of Reproducing Kernel Hilbert Spaces , 2011, ArXiv.

[44] Holger Wendland,et al. Near-optimal data-independent point locations for radial basis function interpolation , 2005, Adv. Comput. Math..

[45] Francis R. Bach,et al. On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[46] Simo Särkkä,et al. A Bayes-Sard Cubature Method , 2018, NeurIPS.

[47] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[48] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[49] Mehtaab Sawhney,et al. Discrepancy minimization via a self-balancing walk , 2020, STOC.

[50] Bernard Haasdonk,et al. Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[51] Jeff M. Phillips,et al. Є-Samples for Kernels , 2013, SODA.

[52] Samira Samadi,et al. Near-Optimal Herding , 2014, COLT.

[53] Philippe Rigollet,et al. A Statistical Perspective on Coreset Density Estimation , 2020, ArXiv.

[54] Art B. Owen,et al. Statistically Efficient Thinning of a Markov Chain Sampler , 2015, ArXiv.

[55] Shayan Oveis Gharan,et al. A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes , 2019, ICML.

[56] Jeff M. Phillips,et al. Improved Coresets for Kernel Density Estimates , 2017, SODA.

[57] Pierre Chainais,et al. Kernel interpolation with continuous volume sampling , 2020, ICML.

[58] Michael A. Osborne,et al. Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[59] L. Schumaker. Spline Functions: Basic Theory , 1981 .

[60] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[61] M. Urner. Scattered Data Approximation , 2016 .

[62] Edo Liberty,et al. Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[63] Wittawat Jitkrittum,et al. Large sample analysis of the median heuristic , 2017, 1707.07269.

[64] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[65] Tirthankar Dasgupta,et al. Sequential Exploration of Complex Surfaces Using Minimum Energy Designs , 2015, Technometrics.

[66] Lawrence Mitchell,et al. Simulating Human Cardiac Electrophysiology on Clinical Time-Scales , 2011, Front. Physio..

[67] S. Bochner. Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse , 1933 .

[68] Frank D. Wood,et al. Super-Sampling with a Reservoir , 2016, UAI.

[69] Wai Ming Tai,et al. New Nearly-Optimal Coreset for Kernel Density Estimation , 2020, ArXiv.

[70] Krikamol Muandet,et al. Minimax Estimation of Kernel Mean Embeddings , 2016, J. Mach. Learn. Res..

[71] Gernot Plank,et al. Simulating ventricular systolic motion in a four-chamber heart model with spatially varying robin boundary conditions to model the effect of the pericardium , 2020, Journal of biomechanics.

[72] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[73] Raaz Dwivedi,et al. The power of online thinning in reducing discrepancy , 2016, Probability Theory and Related Fields.

[74] Oluwasanmi Koyejo,et al. Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[75] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[76] Alexander J. Smola,et al. Super-Samples from Kernel Herding , 2010, UAI.

[77] J. Borwein,et al. Uniform Bounds for the Complementary Incomplete Gamma Function , 2009 .

[78] JOEL SPENCER,et al. Balancing games , 1977, J. Comb. Theory B.

[79] Fred J. Hickernell,et al. A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[80] V. Roshan Joseph,et al. Support points , 2016, The Annals of Statistics.

[81] D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[82] Frances Y. Kuo,et al. High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[83] J. Dedecker,et al. Subgaussian concentration inequalities for geometrically ergodic Markov chains , 2014, 1412.1794.

[84] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.