论文信息 - Frequentist coverage and sup-norm convergence rate in Gaussian process regression - 字舞流文

Frequentist coverage and sup-norm convergence rate in Gaussian process regression

Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in GP regression. As an intermediate result, we develop a Bernstein von-Mises type result under supremum norm in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized $M$-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian point-wise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that the GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides a positive evidence to the long standing problem on optimal supremum norm contraction rate in GP regression.

Debdeep Pati | Anirban Bhattacharya | Yun Yang | A. Bhattacharya | D. Pati | Yun Yang

[1] H. Leahu. On the Bernstein-von Mises phenomenon in the Gaussian white noise model , 2011 .

[2] S. Chatterjee. An error bound in the Sudakov-Fernique inequality , 2005, math/0510424.

[3] Kengo Kato,et al. Gaussian approximation of suprema of empirical processes , 2014 .

[4] Yun Yang,et al. Non-asymptotic theory for nonparametric testing , 2017, 1702.01330.

[5] Subhashis Ghosal,et al. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression , 2014, 1411.6716.

[6] I. Castillo. On Bayesian supremum norm contraction rates , 2013, 1304.1761.

[7] B. Silverman,et al. Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[8] Debdeep Pati,et al. ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES. , 2011, Annals of statistics.

[9] Daniel J. Hsu,et al. Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators , 2017 .

[10] Evgeny Burnaev,et al. Adaptive Design of Experiments Based on Gaussian Processes , 2015, SLDS.

[11] Daniel Jonathan Scansaroli. Stochastic Modeling with Temporally Dependent Gaussian Processes: Applications to Financial Engineering, Pricing and Risk Management , 2012 .

[12] A. Bhattacharya,et al. Posterior contraction in Gaussian process regression using Wasserstein approximations , 2015, 1502.02336.

[13] T. J. Mitchell,et al. Bayesian Prediction of Deterministic Functions, with Applications to the Design and Analysis of Computer Experiments , 1991 .

[14] A. O'Hagan,et al. Bayesian calibration of computer models , 2001 .

[15] L. Brown,et al. A constrained risk inequality with applications to nonparametric functional estimation , 1996 .

[16] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[17] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[18] A. V. D. Vaart,et al. Credible sets in the fixed design model with Brownian motion prior , 2015 .

[19] Shahar Mendelson,et al. Geometric Parameters of Kernel Machines , 2002, COLT.

[20] G. Stupfler. On the weak convergence of the kernel density estimator in the uniform topology , 2016 .

[21] Harry van Zanten,et al. Information Rates of Nonparametric Gaussian Process Methods , 2011, J. Mach. Learn. Res..

[22] Yannick Baraud,et al. A Bernstein-type inequality for suprema of random processes with applications to model selection in non-Gaussian regression , 2009, 0909.1863.

[23] Roger Woodard,et al. Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[24] R. Nickl,et al. On the Bernstein–von Mises phenomenon for nonparametric Bayes procedures , 2013, 1310.2484.

[25] Judith Rousseau,et al. On adaptive posterior concentration rates , 2013, 1305.5270.

[26] Noel A Cressie,et al. Statistics for Spatial Data. , 1992 .

[27] A. W. Vaart,et al. Frequentist coverage of adaptive nonparametric Bayesian credible sets , 2013, 1310.4489.

[28] Van Der Vaart,et al. Rates of contraction of posterior distributions based on Gaussian process priors , 2008 .

[29] Impossibility of weak convergence of kernel density estimators to a non-degenerate law in L 2(ℝ d ) , 2011 .

[30] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[31] A. V. D. Vaart,et al. BAYESIAN INVERSE PROBLEMS WITH GAUSSIAN PRIORS , 2011, 1103.2692.

[32] Kengo Kato,et al. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[33] Kolyan Ray. Adaptive Bernstein–von Mises theorems in Gaussian white noise , 2014, 1407.3397.

[34] D. Cox. An Analysis of Bayesian Inference for Nonparametric Regression , 1993 .

[35] A. Gelfand,et al. Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36] Tatyana Krivobokova,et al. Adaptive empirical Bayesian smoothing splines , 2014, 1411.6860.

[37] Thomas J. Santner,et al. Design and analysis of computer experiments , 1998 .

[38] Richard Nickl,et al. Rates of contraction for posterior distributions in Lr-metrics, 1 ≤ r ≤ ∞ , 2011, 1203.2043.

[39] D. Pollard. Asymptotics via Empirical Processes , 1989 .

[40] Victor Chernozhukov,et al. Anti-concentration and honest, adaptive confidence bands , 2013 .

[41] Thomas J. Santner,et al. The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[42] D. Dunson,et al. Bayesian Manifold Regression , 2013, 1305.0617.

[43] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[44] Martin J. Wainwright,et al. Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[45] C. Butucea. Exact adaptive pointwise estimation on Sobolev classes of densities , 2001 .

[46] G. Matheron. The intrinsic random functions and their applications , 1973, Advances in Applied Probability.

[47] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[48] D. Pati,et al. Bayesian model selection consistency and oracle inequality with intractable marginal likelihood , 2017, 1701.00311.

[49] Richard Nickl,et al. Uniform central limit theorems for kernel density estimators , 2008 .

[50] I. Johnstone. High dimensional Bernstein-von Mises: simple examples. , 2010, Institute of Mathematical Statistics collections.

[51] Van Der Vaart,et al. Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[52] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[53] R. Nickl,et al. Nonparametric Bernstein–von Mises theorems in Gaussian white noise , 2012, 1208.3862.