Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

There is renewed interest in formulating integration as a statistical inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is therefore to reconcile these probabilistic integrators with rigorous convergence guarantees. In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ). Under FWBQ, convergence to the true value of the integral is shown to be up to exponential and posterior contraction rates are proven to be up to super-exponential. In simulations, FWBQ is competitive with state-of-the-art methods and out-performs alternatives based on Frank-Wolfe optimisation. Our approach is applied to successfully quantify numerical error in the solution to a challenging Bayesian model choice problem in cellular biology.

[1]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[2]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  David Duvenaud,et al.  Probabilistic ODE Solvers with Runge-Kutta Means , 2014, NIPS.

[5]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[6]  J. Dunn Convergence Rates for Conditional Gradient Sequences Generated by Implicit Step Length Rules , 1980 .

[7]  Jouni Hartikainen,et al.  On the relation between Gaussian process quadratures and sigma-point methods , 2015, 1504.05994.

[8]  Patrick R. Conrad,et al.  Probability Measures for Numerical Solutions of Differential Equations , 2015, 1506.04592.

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Roman Garnett,et al.  Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature , 2014, NIPS.

[12]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[13]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[14]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[15]  Joe W. Gray,et al.  Causal network inference using biochemical kinetics , 2014, Bioinform..

[16]  Hamrick Jessica,et al.  Mental Rotation as Bayesian Quadrature , 2014 .

[17]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[18]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[19]  Nando de Freitas,et al.  Herded Gibbs Sampling , 2013, J. Mach. Learn. Res..

[20]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[21]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Anthony O'Hagan,et al.  Monte Carlo is fundamentally unsound , 1987 .

[23]  Roman Garnett,et al.  Bayesian Quadrature for Ratios , 2012, AISTATS.

[24]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[25]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[26]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[27]  Francis R. Bach,et al.  On the Equivalence between Quadrature Rules and Random Features , 2015, ArXiv.

[28]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[29]  Art B. Owen,et al.  A constraint on extensible quadrature rules , 2014, Numerische Mathematik.

[30]  Philipp Hennig,et al.  Probabilistic Interpretation of Linear Solvers , 2014, SIAM J. Optim..

[31]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[32]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .