Convergence Guarantees for Adaptive Bayesian Quadrature Methods

Adaptive Bayesian quadrature (ABQ) is a powerful approach to numerical integration that empirically compares favorably with Monte Carlo integration on problems of medium dimensionality (where non-adaptive quadrature is not competitive). Its key ingredient is an acquisition function that changes as a function of previously collected values of the integrand. While this adaptivity appears to be empirically powerful, it complicates analysis. Consequently, there are no theoretical guarantees so far for this class of methods. In this work, for a broad class of adaptive Bayesian quadrature methods, we prove consistency, deriving non-tight but informative convergence rates. To do so we introduce a new concept we call weak adaptivity. Our results identify a large and flexible class of adaptive Bayesian quadrature rules as consistent, within which practitioners can develop empirically efficient methods.

[1]  Roman Garnett,et al.  Bayesian Quadrature for Ratios , 2012, AISTATS.

[2]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[3]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[4]  Anthony O'Hagan,et al.  Monte Carlo is fundamentally unsound , 1987 .

[5]  Erich Novak,et al.  On the Power of Adaption , 1996, J. Complex..

[6]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[7]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[8]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[9]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[10]  Frances Y. Kuo,et al.  High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[11]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[12]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[13]  Holger Wendland,et al.  Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree , 1995, Adv. Comput. Math..

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  Ronald DeVore,et al.  Greedy Algorithms for Reduced Bases in Banach Spaces , 2012, Constructive Approximation.

[16]  Bernard Haasdonk,et al.  Convergence rate of the data-independent P-greedy algorithm in kernel-based approximation , 2016, 1612.02672.

[17]  Grzegorz W. Wasilkowski,et al.  The power of adaptive algorithms for functions with singularities , 2009 .

[18]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[19]  Roman Garnett,et al.  Improving Quadrature for Constrained Integrands , 2018, AISTATS.

[20]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[21]  Kenji Fukumizu,et al.  Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings , 2017, Foundations of Computational Mathematics.

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  E. Novak The Adaption Problem for Nonsymmetric Convex Sets , 1995 .

[24]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[25]  Michael A. Osborne,et al.  Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[26]  Carl E. Rasmussen,et al.  Active Learning of Model Evidence Using Bayesian Quadrature , 2012, NIPS.

[27]  Erich Novak,et al.  Some Results on the Complexity of Numerical Integration , 2014, MCQMC.

[28]  Luigi Acerbi,et al.  An Exploration of Acquisition and Mean Functions in Variational Bayesian Monte Carlo , 2019, AABI.

[29]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[30]  Luigi Acerbi,et al.  Variational Bayesian Monte Carlo , 2018, NeurIPS.

[31]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[32]  Mark A. Girolami,et al.  Bayesian Quadrature for Multiple Related Integrals , 2018, ICML.

[33]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[34]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[35]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[36]  Roman Garnett,et al.  Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature , 2014, NIPS.

[37]  Simo Särkkä,et al.  A Bayes-Sard Cubature Method , 2018, NeurIPS.

[38]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[39]  Kenji Fukumizu,et al.  Convergence guarantees for kernel-based quadrature rules in misspecified settings , 2016, NIPS.

[40]  Erich Novak,et al.  Optimal Recovery and n-Widths for Convex Classes of Functions , 1995 .

[41]  Dino Sejdinovic,et al.  Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.