Risk bounds when learning infinitely many response functions by ordinary linear regression

Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentiallyinfinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.

[1]  A. Krzyżak,et al.  On estimation of surrogate models for multivariate computer experiments , 2019 .

[2]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[3]  P. Villon,et al.  Moving least squares response surface approximation: Formulation and metal forming applications , 2005 .

[4]  Johan Segers,et al.  Monte Carlo integration with a growing number of control variates , 2018, Journal of Applied Probability.

[5]  Philippe Rigo,et al.  A review on simulation-based optimization methods applied to building performance analysis , 2014 .

[6]  D. Wolfe,et al.  ON DERIVATIVE ESTIMATION IN SPLINE REGRESSION , 2000 .

[7]  Bruno Sudret,et al.  Polynomial meta-models with canonical low-rank approximations: Numerical insights and comparison to sparse polynomial chaos expansions , 2015, J. Comput. Phys..

[8]  W. Härdle Applied Nonparametric Regression , 1992 .

[9]  E. Giné,et al.  On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals , 2001 .

[10]  Andy J. Keane,et al.  Recent advances in surrogate-based optimization , 2009 .

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  Kengo Kato,et al.  Some new asymptotic theory for least squares series: Pointwise and uniform results , 2012, 1212.0442.

[13]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[14]  Douglas C. Montgomery,et al.  Response Surface Methodology: Process and Product Optimization Using Designed Experiments , 1995 .

[15]  Joe Wiart,et al.  SURROGATE MODELING OF STOCHASTIC FUNCTIONS-APPLICATION TO COMPUTATIONAL ELECTROMAGNETIC DOSIMETRY , 2018, International Journal for Uncertainty Quantification.

[16]  Johan Segers,et al.  Control variate selection for Monte Carlo integration , 2019, Statistics and Computing.

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[18]  Marcus R. Frean,et al.  Using Gaussian Processes to Optimize Expensive Functions , 2008, Australasian Conference on Artificial Intelligence.

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[21]  C. Bucher,et al.  A fast and efficient response surface approach for structural reliability problems , 1990 .

[22]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[23]  Peter Major,et al.  An estimate on the supremum of a nice class of stochastic integrals and U-statistics , 2006 .

[24]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .