Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Mat\'{e}rn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.

[1]  C. Holmes,et al.  On the marginal likelihood and cross-validation , 2019, Biometrika.

[2]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[3]  Franccois Bachoc,et al.  Maximum likelihood estimation for Gaussian processes under inequality constraints , 2018, Electronic Journal of Statistics.

[4]  Steven Reece,et al.  A Gaussian process framework for modelling stellar activity signals in radial velocity data , 2015, 1506.07304.

[5]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[6]  Dingwen Dong,et al.  Mine Gas Emission Prediction based on Gaussian Process Model , 2012 .

[7]  Guodong Zhang,et al.  Differentiable Compositional Kernel Learning for Gaussian Processes , 2018, ICML.

[8]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[9]  M. Driscoll The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process , 1973 .

[10]  María Cruz López de Silanes,et al.  An extension of a bound for functions in Sobolev spaces, with applications to (m, s)-spline interpolation and smoothing , 2007, Numerische Mathematik.

[11]  F. J. Narcowich,et al.  Sobolev Error Estimates and a Bernstein Inequality for Scattered Data Interpolation via Radial Basis Functions , 2006 .

[12]  J. Rousseau Discussion of “Frequentist coverage of adaptive nonparametric Bayesian credible sets” , 2015, 1509.01903.

[13]  Milan Lukić,et al.  Stochastic processes with sample paths in reproducing kernel Hilbert spaces , 2001 .

[14]  Mark Girolami,et al.  Convergence Guarantees for Gaussian Process Approximations Under Several Observation Models , 2020, ArXiv.

[15]  Gunasekaran Manogaran,et al.  A Gaussian process based big data processing framework in cluster computing environment , 2017, Cluster Computing.

[16]  B. Wang,et al.  Curve prediction and clustering with mixtures of Gaussian process functional regression models , 2008, Stat. Comput..

[17]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[18]  R. Schaback A unified theory of radial basis functions Native Hilbert spaces for radial basis functions II , 2000 .

[19]  Leto Peel,et al.  Maritime anomaly detection using Gaussian Process active learning , 2012, 2012 15th International Conference on Information Fusion.

[20]  Armin Iske,et al.  Approximation Theory and Algorithms for Data Analysis , 2018 .

[21]  Dino Sejdinovic,et al.  Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.

[22]  Ingo Steinwart,et al.  Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs , 2012 .

[23]  M. Reimherr,et al.  Fast and Fair Simultaneous Confidence Bands for Functional Parameters. , 2019, 1910.00131.

[24]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[25]  Fred J. Hickernell,et al.  Fast automatic Bayesian cubature using lattice sampling , 2018, Statistics and Computing.

[26]  E. Novak,et al.  Tractability of Multivariate Problems , 2008 .

[27]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[28]  Amine Hadji,et al.  Can We Trust Bayesian Uncertainty Quantification from Gaussian Process Priors with Squared Exponential Covariance Kernel? , 2019, SIAM/ASA J. Uncertain. Quantification.

[29]  Harry van Zanten,et al.  Information Rates of Nonparametric Gaussian Process Methods , 2011, J. Mach. Learn. Res..

[30]  Salah Sukkarieh,et al.  A Gaussian process-based RRT planner for the exploration of an unknown and cluttered environment with a UAV , 2013, Adv. Robotics.

[31]  Toni Karvonen,et al.  Kernel-Based and Bayesian Methods for Numerical Integration , 2019 .

[32]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[33]  G. Fasshauer Positive definite kernels: past, present and future , 2011 .

[34]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[35]  A. Stuart,et al.  The Bayesian Approach to Inverse Problems , 2013, 1302.6989.

[36]  Ingo Steinwart,et al.  Convergence Types and Rates in Generic Karhunen-Loève Expansions with Applications to Sample Path Properties , 2014, Potential Analysis.

[37]  Yu Peng,et al.  Prognostics for state of health estimation of lithium-ion batteries based on combination Gaussian process functional regression , 2013, Microelectron. Reliab..

[38]  E. Novak Deterministic and Stochastic Error Bounds in Numerical Analysis , 1988 .

[39]  Christian Rieger,et al.  F ¨ Ur Mathematik in Den Naturwissenschaften Leipzig Sampling Inequalities for Infinitely Smooth Functions, with Applications to Interpolation and Machine Learning Sampling Inequalities for Infinitely Smooth Functions, with Applications to Interpolation and Machine Learning , 2022 .

[40]  Simo Särkkä,et al.  Asymptotics of Maximum Likelihood Parameter Estimates For Gaussian Processes: The Ornstein–Uhlenbeck Prior , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[41]  Robert Schaback,et al.  Stability of kernel-based interpolation , 2010, Adv. Comput. Math..

[42]  Kenji Fukumizu,et al.  Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings , 2017, Foundations of Computational Mathematics.

[43]  Yan Wang,et al.  On the Improved Rates of Convergence for Matérn-Type Kernel Ridge Regression with Application to Calibration of Computer Models , 2020, SIAM/ASA J. Uncertain. Quantification.

[44]  Sébastien Marmin Bayesian calibration of computer models with modern Gaussian process emulators , 2018 .

[45]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[46]  Franccois Bachoc,et al.  Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case , 2014, 1412.1926.

[47]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[48]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[49]  H. Triebel Theory of Function Spaces III , 2008 .

[50]  Simo Särkkä,et al.  A Bayes-Sard Cubature Method , 2018, NeurIPS.

[51]  M. Stein Spline smoothing with an estimated order parameter , 1993 .

[52]  Aretha L Teckentrup Convergence of Gaussian Process Regression with Estimated Hyper-parameters and Applications in Bayesian Inverse Problems , 2020, SIAM/ASA J. Uncertain. Quantification.

[53]  Klaus Ritter,et al.  Average-case analysis of numerical problems , 2000, Lecture notes in mathematics.

[54]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[55]  A. V. D. Vaart,et al.  Empirical Bayes scaling of Gaussian priors in the white noise model , 2013 .

[56]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[57]  Thi Mong Ngoc Nguyen,et al.  Cross-validation estimation of covariance parameters under fixed-domain asymptotics , 2016, J. Multivar. Anal..

[58]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[59]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[60]  Robert Schaback,et al.  Interpolation of spatial data – A stochastic or a deterministic problem? , 2013, European Journal of Applied Mathematics.

[62]  Cristian Sminchisescu,et al.  Fourier Kernel Learning , 2012, ECCV.

[63]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[64]  Kendall E. Atkinson An introduction to numerical analysis , 1978 .

[65]  Barnabás Póczos,et al.  Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[66]  Michael L. Stein,et al.  Maximum Likelihood Estimation for a Smooth Gaussian Random Field Model , 2017, SIAM/ASA J. Uncertain. Quantification.

[67]  María Cruz López de Silanes,et al.  Extension of sampling inequalities to Sobolev semi-norms of fractional order and derivative data , 2012, Numerische Mathematik.

[68]  M. Scheuerer Regularity of the sample paths of a general second order random field , 2010 .

[69]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[70]  Mark A. Girolami,et al.  Bayesian Probabilistic Numerical Methods , 2017, SIAM Rev..

[71]  Robert Schaback,et al.  Improved error bounds for scattered data interpolation by radial basis functions , 1999, Math. Comput..

[72]  Gregory E. Fasshauer,et al.  Kernel-based Approximation Methods using MATLAB , 2015, Interdisciplinary Mathematical Sciences.

[73]  Robert Schaback,et al.  Superconvergence of kernel-based interpolation , 2016, J. Approx. Theory.

[74]  V. Bogachev Gaussian Measures on a , 2022 .

[75]  Robert Schaback,et al.  Error estimates and condition numbers for radial basis function interpolation , 1995, Adv. Comput. Math..

[76]  P. Grisvard Elliptic Problems in Nonsmooth Domains , 1985 .

[77]  Holger Wendland,et al.  Approximate Interpolation with Applications to Selecting Smoothing Parameters , 2005, Numerische Mathematik.

[78]  D. Mackay,et al.  HYPERPARAMETERS: OPTIMIZE, OR INTEGRATE OUT? , 1996 .

[79]  F. M. Larkin Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[80]  E FasshauerG Positive definite kernels: past, present and future , 2011 .