Probabilistic Integration: A Role for Statisticians in Numerical Analysis?

A research frontier has emerged in scientific computation, founded on the principle that numerical error entails epistemic uncertainty that ought to be subjected to statistical analysis. This viewpoint raises several interesting challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational pipeline. This paper examines thoroughly the case for probabilistic numerical methods in statistical computation and a specific case study is presented for Markov chain and Quasi Monte Carlo methods. A probabilistic integrator is equipped with a full distribution over its output, providing a measure of epistemic uncertainty that is shown to be statistically valid at finite computational levels, as well as in asymptotic regimes. The approach is motivated by expensive integration problems, where, as in krigging, one is willing to expend, at worst, cubic computational effort in order to gain uncertainty quantification. There, probabilistic integrators enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assessment of the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and uncertainty quantification in oil reservoir modelling.

[1]  J. Skilling Bayesian Solution of Ordinary Differential Equations , 1992 .

[2]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[3]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[4]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[5]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[6]  Joseph D. Ward,et al.  Kernel based quadrature on spheres and other homogeneous spaces , 2012, Numerische Mathematik.

[7]  Alexander J. Smola,et al.  Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.

[8]  Houman Owhadi,et al.  Multigrid with Rough Coefficients and Multiresolution Operator Decomposition from Hierarchical Information Games , 2015, SIAM Rev..

[9]  Michael Andrew Christie,et al.  Comparison of Stochastic Sampling Algorithms for Uncertainty Quantification , 2010 .

[10]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[11]  Alvise Sommariva,et al.  Numerical Cubature on Scattered Data by Radial Basis Functions , 2005, Computing.

[12]  Frances Y. Kuo,et al.  High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[13]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[14]  H. Wozniakowski,et al.  Gauss-Hermite quadratures for functions from Hilbert spaces with Gaussian reproducing kernels , 2012 .

[15]  Michael W. Mahoney,et al.  Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[16]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  Erich Novak,et al.  A Universal Algorithm for Multivariate Integration , 2015, Found. Comput. Math..

[18]  M. Girolami,et al.  Control Functionals for Quasi-Monte Carlo Integration , 2015, AISTATS.

[19]  Patrick R. Conrad,et al.  Probability Measures for Numerical Solutions of Differential Equations , 2015, 1506.04592.

[20]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[21]  Philipp Hennig,et al.  Probabilistic Interpretation of Linear Solvers , 2014, SIAM J. Optim..

[22]  A. Boucher,et al.  History matching and uncertainty quantification of facies models with multiple geological interpretations , 2013, Computational Geosciences.

[23]  Yee Whye Teh,et al.  Mondrian Forests for Large-Scale Regression when Uncertainty Matters , 2015, AISTATS.

[24]  Dirk Nuyens,et al.  Fast Component-by-Component Construction, a Reprise for Different Kernels , 2006 .

[25]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[26]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[27]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[28]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[29]  Joseph D. Ward,et al.  Localized Bases for Kernel Spaces on the Unit Sphere , 2012, SIAM J. Numer. Anal..

[30]  Andriy Bondarenko,et al.  Optimal asymptotic bounds for spherical designs , 2010, 1009.4407.

[31]  Luís Paulo Santos,et al.  Efficient Quadrature Rules for Illumination Integrals: From Quasi Monte Carlo to Bayesian Monte Carlo , 2015, Efficient Quadrature Rules for Illumination Integrals: From Quasi Monte Carlo to Bayesian Monte Carlo.

[32]  N. Chopin,et al.  Sequential Quasi-Monte Carlo , 2014, 1402.4039.

[33]  P. McCullagh,et al.  A theory of statistical models for Monte Carlo integration , 2003 .

[34]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[35]  Martin Kiefel,et al.  Quasi-Newton Methods: A New Direction , 2012, ICML.

[36]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[37]  Fred J. Hickernell,et al.  On Dimension-independent Rates of Convergence for Function Approximation with Gaussian Kernels , 2012, SIAM J. Numer. Anal..

[38]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[39]  Yasin Hajizadeh,et al.  Ant colony optimization for history matching and uncertainty quantification of reservoir models , 2011 .

[40]  Zongmin Wu,et al.  Local error estimates for radial basis function interpolation of scattered data , 1993 .

[41]  Stian Kristoffersen The Empirical Interpolation Method , 2013 .

[42]  Ilya M. Sobol,et al.  Sensitivity Estimates for Nonlinear Mathematical Models , 1993 .

[43]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[44]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[45]  Winfried Sickel,et al.  Tensor products of Sobolev-Besov spaces and applications to approximation from the hyperbolic cross , 2009, J. Approx. Theory.

[46]  Bryan N. Lawrence,et al.  High-resolution global climate modelling: the UPSCALE project, a large-simulation campaign , 2014 .

[47]  Robert Schaback,et al.  Error estimates and condition numbers for radial basis function interpolation , 1995, Adv. Comput. Math..

[48]  Sebastian Mosbach,et al.  A quantitative probabilistic investigation into the accumulation of rounding errors in numerical ODE solution , 2009, Comput. Math. Appl..

[49]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[50]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Christian Bouville,et al.  A Bayesian Monte Carlo Approach to Global Illumination , 2009, Comput. Graph. Forum.

[52]  H. Poincaré Calcul des Probabilités , 1912 .

[53]  Holger Wendland,et al.  Multiscale approximation for functions in arbitrary Sobolev spaces by scaled radial basis functions on the unit sphere , 2012 .

[54]  F. M. Larkin Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[55]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[56]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[57]  K. Fukumizu,et al.  Learning via Hilbert Space Embedding of Distributions , 2007 .

[58]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[59]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[60]  Luís Paulo Santos,et al.  A Spherical Gaussian Framework for Bayesian Monte Carlo Rendering of Glossy Surfaces , 2013, IEEE Transactions on Visualization and Computer Graphics.

[61]  Francis R. Bach,et al.  On the Equivalence between Quadrature Rules and Random Features , 2015, ArXiv.

[62]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[63]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[64]  Eric Darve,et al.  The Inverse Fast Multipole Method , 2014, ArXiv.

[65]  Art B. Owen,et al.  A constraint on extensible quadrature rules , 2014, Numerische Mathematik.

[66]  M. Wand,et al.  Quasi-Monte Carlo for Highly Structured Generalised Response Models , 2008 .

[67]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[68]  P. Erdös,et al.  The Gaussian Law of Errors in the Theory of Additive Number Theoretic Functions , 1940 .

[69]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[70]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[71]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[72]  Frances Y. Kuo,et al.  Component-by-component constructions achieve the optimal rate of convergence for multivariate integration in weighted Korobov and Sobolev spaces , 2003, J. Complex..

[73]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[74]  Carl E. Rasmussen,et al.  Active Learning of Model Evidence Using Bayesian Quadrature , 2012, NIPS.

[75]  A. Owen,et al.  Control variates for quasi-Monte Carlo , 2005 .

[76]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[77]  Henryk Wozniakowski,et al.  When Are Quasi-Monte Carlo Algorithms Efficient for High Dimensional Integrals? , 1998, J. Complex..

[78]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[79]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[80]  Frances Y. Kuo,et al.  On the Choice of Weights in a Function Space for Quasi-Monte Carlo Methods for a Class of Generalised Response Models in Statistics , 2013 .

[81]  Mark Girolami,et al.  Unbiased local solutions of partial differential equations via the Feynman-Kac Identities , 2016, 1603.04196.

[82]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[83]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Comparison , 2014, 1404.5053.

[84]  Roman Garnett,et al.  Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature , 2014, NIPS.

[85]  Holger Wendland,et al.  Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting , 2004, Math. Comput..

[86]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[87]  A. Stuart,et al.  The Bayesian Approach to Inverse Problems , 2013, 1302.6989.

[88]  Richard Nickl,et al.  Nonparametric Bayesian posterior contraction rates for discretely observed scalar diffusions , 2015, 1510.05526.

[89]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[90]  N. Nguyen,et al.  A general multipurpose interpolation procedure: the magic points , 2008 .

[91]  Fabian J. Theis,et al.  An adaptive scheduling scheme for calculating Bayes factors with thermodynamic integration using Simpson’s rule , 2015, Statistics and Computing.

[92]  Ian H. Sloan,et al.  QMC designs: Optimal order Quasi Monte Carlo integration schemes on the sphere , 2012, Math. Comput..

[93]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[94]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[95]  Grzegorz W. Wasilkowski,et al.  Average Case ϵ-Complexity in Computer Science: A Bayesian View , 1983 .

[96]  Armin Iske,et al.  Multiresolution Methods in Scattered Data Modelling , 2004, Lecture Notes in Computational Science and Engineering.

[97]  Milan Lukić,et al.  Stochastic processes with sample paths in reproducing kernel Hilbert spaces , 2001 .

[98]  Jouni Hartikainen,et al.  On the relation between Gaussian process quadratures and sigma-point methods , 2015, 1504.05994.

[99]  Mark A. Girolami,et al.  Emulation of higher-order tensors in manifold Monte Carlo methods for Bayesian Inverse Problems , 2015, J. Comput. Phys..

[100]  Michael A. Osborne,et al.  Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[101]  N. S. Bakhvalov,et al.  On the optimality of linear methods for operator approximation in convex classes of functions , 1971 .

[102]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[103]  E. Novak,et al.  Tractability of Multivariate Problems , 2008 .

[104]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[105]  J. Seidel,et al.  Spherical codes and designs , 1977 .

[106]  Joseph B. Kadane,et al.  Parallel and sequential computation: a statistician's view , 1985, J. Complex..

[107]  Holger Wendland,et al.  Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree , 1995, Adv. Comput. Math..

[108]  David Duvenaud,et al.  Probabilistic ODE Solvers with Runge-Kutta Means , 2014, NIPS.

[109]  J. Richard Swenson,et al.  Tests of probabilistic models for propagation of roundoff errors , 1966, CACM.

[110]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[111]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[112]  Philip Rabinowitz,et al.  Methods of Numerical Integration , 1985 .

[113]  Fred J. Hickernell,et al.  A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[114]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[115]  Philipp Hennig,et al.  Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.

[116]  Henryk Wozniakowski,et al.  Exponential convergence and tractability of multivariate integration for Korobov spaces , 2011, Math. Comput..

[117]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[118]  Klaus Ritter,et al.  Average-case analysis of numerical problems , 2000, Lecture notes in mathematics.

[119]  J. Dick Higher order scrambled digital nets achieve the optimal rate of the root mean square error for smooth integrands , 2010, 1007.0842.

[120]  E. Novak,et al.  Tractability of Multivariate Problems Volume II: Standard Information for Functionals , 2010 .

[121]  H. Muller,et al.  Functional data analysis for density functions by transformation to a Hilbert space , 2016, 1601.02869.

[122]  Benjamin Stamm,et al.  Parameter multi‐domain ‘hp’ empirical interpolation , 2012 .

[123]  Ian H. Sloan,et al.  Worst-case errors in a Sobolev space setting for cubature over the sphere $S^2$ , 2005 .

[124]  Nial Friel,et al.  Improving power posterior estimation of statistical evidence , 2012, Stat. Comput..

[125]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .