Regularity dependence of the rate of convergence of the learning curve for Gaussian process regression

This paper deals with the speed of convergence of the learning curve in a Gaussian process regression framework. The learning curve describes the average generalization error of the Gaussian process used for the regression. More specifically, it is defined in this paper as the integral of the mean squared error over the input parameter space with respect to the probability measure of the input parameters. The main result is the proof of a theorem giving the mean squared error in function of the number of observations for a large class of kernels and for any dimension when the number of observations is large. From this result, we can deduce the asymptotic behavior of the generalization error. The presented proof generalizes previous ones that were limited to more specific kernels or to small dimensions (one or two). The result can be used to build an optimal strategy for resources allocation. This strategy is applied successfully to a nuclear safety problem.

[1]  K. Ritter Almost Optimal Differentiation Using Noisy Data , 1996 .

[2]  Geoffrey M. Laslett,et al.  Kriging and Splines: An Empirical Comparison of their Predictive Performance in Some Applications , 1994 .

[3]  T. Gneiting,et al.  Matérn Cross-Covariance Functions for Multivariate Random Fields , 2010 .

[4]  Christopher K. I. Williams,et al.  Upper and Lower Bounds on the Learning Curve for Gaussian Processes , 2000, Machine Learning.

[5]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[6]  Josselin Garnier,et al.  Adaptive directional stratification for controlled estimation of the probability of a rare event , 2011, Reliab. Eng. Syst. Saf..

[7]  G. Wahba,et al.  Design Problems for Optimal Surface Interpolation. , 1979 .

[8]  Jerome Sacks,et al.  Variance estimation for approximately linear models , 1981 .

[9]  Peter Sollich,et al.  Learning Curves for Gaussian Process Regression: Approximations and Bounds , 2001, Neural Computation.

[10]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[11]  M Bozzini,et al.  Numerical differentiation of 2D functions from noisy data , 2003 .

[12]  Joachim Miss,et al.  THE MORET 4.B MONTE CARLO CODE New features to treat complex criticality systems , 2005 .

[13]  V. Picheny Improving accuracy and compensating for uncertainty in surrogate modeling , 2009 .

[14]  Manfred Opper,et al.  General Bounds on Bayes Errors for Regression with Gaussian Processes , 1998, NIPS.

[15]  Radu Alexandru Todor,et al.  Robust Eigenvalue Computation for Smoothing Operators , 2006, SIAM J. Numer. Anal..

[16]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[17]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[18]  Jared C. Bronski,et al.  Asymptotics of Karhunen-Loeve Eigenvalues and Tight Constants for Probability Distributions of Passive Scalar Transport , 2003 .

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Multivariate Geostatistics , 2004 .

[21]  A. I. Nazarov,et al.  Exact L2-small ball behavior of integrated Gaussian processes and spectral asymptotics of boundary value problems , 2004 .

[22]  Руслан Сергеевич Пусев,et al.  Асимптотика малых уклонений в весовой квадратичной норме для полей и процессов Матерна@@@Small deviation asymptotics for Matern processes and fields under weighted quadratic norm , 2010 .

[23]  Klaus Ritter,et al.  Average-case analysis of numerical problems , 2000, Lecture notes in mathematics.

[24]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[25]  J. Berger,et al.  Objective Bayesian Analysis of Spatially Correlated Data , 2001 .