Nested Kriging predictions for datasets with a large number of observations

This work falls within the context of predicting the value of a real function at some input locations given a limited number of observations of this function. The Kriging interpolation technique (or Gaussian process regression) is often considered to tackle such a problem, but the method suffers from its computational burden when the number of observation points is large. We introduce in this article nested Kriging predictors which are constructed by aggregating sub-models based on subsets of observation points. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature. Contrarily to some other methods it can be shown that the proposed aggregation method is consistent. Finally, the practical interest of the proposed method is illustrated on simulated datasets and on an industrial test case with $$10^4$$104 observations in a 6-dimensional space.

[1]  Hao Wang,et al.  Optimally Weighted Cluster Kriging for Big Data Regression , 2015, IDA.

[2]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[3]  Ashwini Maurya,et al.  A Well-Conditioned and Sparse Estimation of Covariance and Inverse Covariance Matrices Using a Joint Penalty , 2014, J. Mach. Learn. Res..

[4]  Robert L. Winkler,et al.  The Consensus of Subjective Probability Distributions , 1968 .

[5]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[6]  Jianhua Z. Huang,et al.  Full-scale approximations of spatio-temporal covariance models for large datasets , 2014 .

[7]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[8]  J. Chaboche,et al.  Mechanics of Solid Materials , 1990 .

[9]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[10]  Lyle H. Ungar,et al.  Modeling Probability Forecasts via Information Diversity , 2014 .

[11]  David J. Fleet,et al.  Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions , 2014, ArXiv.

[12]  N. Cressie,et al.  A Fast, Optimal Spatial-Prediction Method for Massive Datasets , 2005 .

[13]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[14]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[15]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[16]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[17]  Di Wu,et al.  A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data , 2015 .

[18]  H. Künsch Gaussian Markov random fields , 1979 .

[19]  A. Azzouz 2011 , 2020, City.

[20]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[21]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[22]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[23]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[24]  Stephen J. Roberts,et al.  String and Membrane Gaussian Processes , 2015, J. Mach. Learn. Res..

[25]  A. Gelfand,et al.  Adaptive Gaussian predictive process models for large spatial datasets , 2011, Environmetrics.

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[28]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[29]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[30]  Matthias Katzfuss,et al.  Bayesian nonstationary spatial modeling for very large datasets , 2012, 1204.2098.

[31]  Franccois Bachoc,et al.  Some properties of nested Kriging predictors , 2017, 1707.05708.

[32]  Shalabh Bhatnagar,et al.  Stochastic Recursive Algorithms for Optimization , 2012 .

[33]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[34]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[35]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  G. Wahba Spline models for observational data , 1990 .