Analysis of computationally demanding models with continuous and categorical inputs

The analysis of many physical and engineering problems involves running complex computational models (e.g., simulation models and computer codes). With problems of this type, it is important to understand the relationships between the input (whose values are often imprecisely known) and the output variables, and to characterize the uncertainty in the output. Often, some of the input variables are categorical in nature (e.g., pointer variables to alternative models or different types of material, etc.). A computational model that sufficiently represents reality is often very costly in terms of run time. When the models are computationally demanding, meta-model approaches to their analysis have been shown to be very useful. However, the most popular meta-models for computational computer models do not explicitly allow for categorical input variables. In this case, categorical inputs are simply ordered in some way and treated as continuous variables in the estimation of a meta-model. In many cases, this can lead to undesirable and misleading results. In this paper, two meta-models based on functional ANOVA decomposition are presented that explicitly allow for an appropriate treatment of categorical inputs. The effectiveness of the presented meta-models in the analysis of models with continuous and categorical inputs is illustrated with several test cases and also with results from a real analysis.

[1]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[2]  V. Roshan Joseph,et al.  Functionally Induced Priors for the Analysis of Experiments , 2007, Technometrics.

[3]  Curtis B. Storlie,et al.  Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes , 2009, Technometrics.

[4]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[5]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[6]  J. Friedman Multivariate adaptive regression splines , 1990 .

[7]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[8]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[9]  Curtis B. Storlie,et al.  A Locally Adaptive Penalty for Estimation of Functions With Varying Roughness , 2010 .

[10]  Garrett Dancik mlegp: an R package for Gaussian process modeling and sensitivity analysis , 2007 .

[11]  Chong Gu Smoothing Spline Anova Models , 2002 .

[12]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[13]  Jon C. Helton,et al.  Latin Hypercube Sampling and the Propagation of Uncertainty in Analyses of Complex Systems , 2002 .

[14]  M. J. Bayarri,et al.  Computer model validation with functional output , 2007, 0711.3271.

[15]  Jon C. Helton,et al.  A distribution-free test for the relationship between model input and output when using Latin hypercube sampling , 2003, Reliab. Eng. Syst. Saf..

[16]  Jon C. Helton,et al.  Summary discussion of the 1996 performance assessment for the Waste Isolation Pilot Plant , 2000, Reliab. Eng. Syst. Saf..

[17]  F. J. Davis,et al.  Illustration of Sampling‐Based Methods for Uncertainty and Sensitivity Analysis , 2002, Risk analysis : an official publication of the Society for Risk Analysis.

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  G. Matheron Principles of geostatistics , 1963 .

[20]  Brian J Reich,et al.  Surface Estimation, Variable Selection, and the Nonparametric Oracle Property. , 2011, Statistica Sinica.

[21]  Jon C. Helton,et al.  Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models , 2009, Reliab. Eng. Syst. Saf..

[22]  Jon C. Helton,et al.  Calculation of reactor accident safety goals , 1993 .

[23]  M.J.W. Jansen,et al.  Review of Saltelli, A. & Chan, K. & E.M.Scott (Eds) (2000), Sensitivity analysis. Wiley (2000) , 2001 .

[24]  Jon C. Helton,et al.  Multiple predictor smoothing methods for sensitivity analysis , 2005, Proceedings of the Winter Simulation Conference, 2005..

[25]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[26]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[27]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[28]  D. Higdon,et al.  Computer Model Calibration Using High-Dimensional Output , 2008 .

[29]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[30]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[31]  James W. Wisnowski,et al.  Smoothing and Regression: Approaches, Computation, and Application , 2002 .

[32]  Saltelli Andrea,et al.  Global Sensitivity Analysis: The Primer , 2008 .

[33]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[34]  R. Eubank Nonparametric Regression and Spline Smoothing , 1999 .

[35]  Kenny Q. Ye,et al.  Variable Selection for Gaussian Process Models in Computer Experiments , 2006, Technometrics.

[36]  Bertrand Iooss,et al.  An efficient methodology for modeling complex computer codes with Gaussian processes , 2008, Comput. Stat. Data Anal..

[37]  M. Fuentes Spectral methods for nonstationary spatial processes , 2002 .

[38]  Jon C. Helton,et al.  Uncertainty and sensitivity analysis in performance assessment for the proposed repository for high-level radioactive waste at Yucca Mountain, Nevada , 2010 .

[39]  Jon C. Helton,et al.  Multiple predictor smoothing methods for sensitivity analysis: Example results , 2008, Reliab. Eng. Syst. Saf..

[40]  Tapabrata Maiti,et al.  Bayesian Data Analysis (2nd ed.) (Book) , 2004 .

[41]  Thomas J. Santner,et al.  Prediction for Computer Experiments Having Quantitative and Qualitative Input Variables , 2009, Technometrics.

[42]  David Higdon,et al.  Non-Stationary Spatial Modeling , 2022, 2212.08043.

[43]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[44]  Jon C. Helton,et al.  Survey of sampling-based methods for uncertainty and sensitivity analysis , 2006, Reliab. Eng. Syst. Saf..

[45]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[46]  Jon C. Helton,et al.  Uncertainty and sensitivity analysis in performance assessment for the proposed high-level radioactive waste repository at Yucca Mountain, Nevada , 2012, Reliab. Eng. Syst. Saf..

[47]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[48]  Henry P. Wynn,et al.  Screening, predicting, and computer experiments , 1992 .

[49]  Dave Higdon,et al.  Combining Field Data and Computer Simulations for Calibration and Prediction , 2005, SIAM J. Sci. Comput..

[50]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[51]  Jon C. Helton,et al.  The 1996 performance assessment for the Waste Isolation Pilot Plant , 1998, Reliability Engineering & System Safety.

[52]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[53]  G. Wahba Spline models for observational data , 1990 .

[54]  Peter Z. G. Qian,et al.  Gaussian Process Models for Computer Experiments With Qualitative and Quantitative Factors , 2008, Technometrics.

[55]  ' RonaldL.Iman,et al.  An Investigation of Uncertainty and Sensitivity Analysis Techniques for Computer Models , 2006 .

[56]  Jon C. Helton,et al.  Representation of two-phase flow in the vicinity of the repository in the 1996 performance assessment for the Waste Isolation Pilot Plant , 2000, Reliab. Eng. Syst. Saf..

[57]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[58]  Jon C. Helton,et al.  Multiple predictor smoothing methods for sensitivity analysis: Description of techniques , 2008, Reliab. Eng. Syst. Saf..

[59]  David G. T. Denison,et al.  Bayesian MARS , 1998, Stat. Comput..

[60]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[61]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[62]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[63]  A. O'Hagan,et al.  Bayesian inference for the uncertainty distribution of computer model outputs , 2002 .

[64]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[65]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[66]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[67]  Hugh Chipman,et al.  Bayesian Variable Selection with Related PredictorsHugh ChipmanGraduate , 1996 .

[68]  A. Saltelli,et al.  A quantitative model-independent method for global sensitivity analysis of model output , 1999 .