Analysis Methods for Computer Experiments: How to Assess and What Counts?

Statistical methods based on a regression model plus a zero-mean Gaussian process (GP) have been widely used for predicting the output of a deterministic computer code. There are many suggestions in the literature for how to choose the regression component and how to model the correlation structure of the GP. This article argues that comprehensive, evidence-based assessment strategies are needed when comparing such modeling options. Otherwise, one is easily misled. Applying the strategies to several computer codes shows that a regression model more complex than a constant mean either has little impact on prediction accuracy or is an impediment. The choice of correlation function has modest effect, but there is little to separate two common choices, the power exponential and the Matern, if the latter is optimized with respect to its smoothness. The applications presented here also provide no evidence that a composite of GPs provides practical improvement in prediction accuracy. A limited comparison of Bayesian and empirical Bayes methods is similarly inconclusive. In contrast, we find that the effect of experimental design is surprisingly large, even for designs of the same type with the same theoretical properties.

[1]  Pritam Ranjan,et al.  Design of Computer Experiments for Optimization, Estimation of Function Contours, and Related Objectives , 2016, 1601.05887.

[2]  Victor Picheny,et al.  Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision , 2013, Technometrics.

[3]  Hao Chen,et al.  Bayesian prediction and inference in analysis of computer experiments , 2013 .

[4]  Peter Challenor,et al.  Computational Statistics and Data Analysis the Effect of the Nugget on Gaussian Process Emulators of Computer Models , 2022 .

[5]  V. Roshan Joseph,et al.  Composite Gaussian process models for emulating expensive functions , 2012, 1301.2503.

[6]  Robert B. Gramacy,et al.  Cases for the nugget in modeling computer experiments , 2010, Statistics and Computing.

[7]  Derek Bingham,et al.  Efficient emulators of computer experiments using compactly supported correlation functions, with an application to cosmology , 2011, 1107.0749.

[8]  Pritam Ranjan,et al.  A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data , 2010, Technometrics.

[9]  Jason L. Loeppky,et al.  Batch sequential designs for computer experiments , 2010 .

[10]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[11]  Jerome Sacks,et al.  Choosing the Sample Size of a Computer Experiment: A Practical Guide , 2009, Technometrics.

[12]  James O. Berger,et al.  Using Statistical and Computer Models to Quantify Volcanic Hazards , 2009, Technometrics.

[13]  Agus Sudjianto,et al.  Blind Kriging: A New Method for Developing Metamodels , 2008 .

[14]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[15]  James O. Berger,et al.  A Framework for Validation of Computer Models , 2007, Technometrics.

[16]  David Higdon,et al.  Hierarchical Bayesian Analysis and the Preston-Tonks-Wallace Model , 2007 .

[17]  William J. Welch,et al.  Screening the Input Variables to a Computer Model Via Analysis of Variance and Visualization , 2006 .

[18]  Dean L. Preston,et al.  Model of plastic deformation for extreme loading conditions , 2003 .

[19]  W. J. Studden,et al.  Design and analysis of computer experiments when the output is highly correlated over the input space , 2002 .

[20]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[21]  Erich Novak,et al.  High dimensional polynomial interpolation on sparse grids , 2000, Adv. Comput. Math..

[22]  Markus Abt Estimating the Prediction Mean Squared Error in Gaussian Stochastic Processes with Exponential Correlation Structure , 1999 .

[23]  J. Sacks,et al.  Analysis of protein activity data by Gaussian stochastic process models. , 1999, Journal of biopharmaceutical statistics.

[24]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[25]  J. Sacks,et al.  Predicting Urban Ozone Levels and Trends with Semiparametric Modeling , 1996 .

[26]  J Sacks,et al.  Effect of outdoor airborne particulate matter on daily death counts. , 1995, Environmental health perspectives.

[27]  O. R. West,et al.  Measurement error and spatial variability effects on characterization of volatile organics in the subsurface. , 1995, Environmental science & technology.

[28]  T. J. Mitchell,et al.  Exploratory designs for computational experiments , 1995 .

[29]  William J. Welch,et al.  Parameter space exploration of an ocean general circulation model using an isopycnal mixing parameterization , 1994 .

[30]  J. Sacks,et al.  Artic sea ice variability: Model sensitivities and a multidecadal simulation , 1994 .

[31]  T. J. Mitchell,et al.  Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction , 1993 .

[32]  Henry P. Wynn,et al.  Screening, predicting, and computer experiments , 1992 .

[33]  T. J. Mitchell,et al.  Bayesian Prediction of Deterministic Functions, with Applications to the Design and Analysis of Computer Experiments , 1991 .

[34]  Jerome Sacks,et al.  Designs for Computer Experiments , 1989 .

[35]  A. Kuusk,et al.  A reflectance model for the homogeneous plant canopy and its inversion , 1989 .