Analyzing Stochastic Computer Models: A Review with Opportunities

In modern science, computer models are often used to understand complex phenomena, and a thriving statistical community has grown around analyzing them. This review aims to bring a spotlight to the growing prevalence of stochastic computer models -- providing a catalogue of statistical methods for practitioners, an introductory view for statisticians (whether familiar with deterministic computer models or not), and an emphasis on open questions of relevance to practitioners and statisticians. Gaussian process surrogate models take center stage in this review, and these, along with several extensions needed for stochastic settings, are explained. The basic issues of designing a stochastic computer experiment and calibrating a stochastic computer model are prominent in the discussion. Instructive examples, with data and code, are used to describe the implementation of, and results from, various methods.

[1]  Michael Frenklach,et al.  Comparison of Statistical and Deterministic Frameworks of Uncertainty Quantification , 2016, SIAM/ASA J. Uncertain. Quantification.

[2]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[3]  Vadim Sokolov,et al.  Practical Bayesian Optimization for Transportation Simulators , 2018, 1810.03688.

[4]  Olivier Roustant,et al.  Calculations of Sobol indices for the Gaussian process metamodel , 2008, Reliab. Eng. Syst. Saf..

[5]  Merlin Keller,et al.  Adaptive Numerical Designs for the Calibration of Computer Codes , 2015, SIAM/ASA J. Uncertain. Quantification.

[6]  Rui Tuo,et al.  A Theoretical Framework for Calibration in Computer Models: Parametrization, Estimation and Convergence Properties , 2015, SIAM/ASA J. Uncertain. Quantification.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  James O. Berger,et al.  A Framework for Validation of Computer Models , 2007, Technometrics.

[9]  V. Roshan Joseph,et al.  Composite Gaussian process models for emulating expensive functions , 2012, 1301.2503.

[10]  Jakub Szymanik,et al.  Methods Results & Discussion , 2007 .

[11]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[12]  Wenjia Wang,et al.  Controlling Sources of Inaccuracy in Stochastic Kriging , 2017, Technometrics.

[13]  Jenný Brynjarsdóttir,et al.  Learning about physical parameters: the importance of model discrepancy , 2014 .

[14]  Mike Ludkovski,et al.  Replication or Exploration? Sequential Design for Stochastic Simulation Experiments , 2017, Technometrics.

[15]  Long Wang,et al.  Scaled Gaussian Stochastic Process for Computer Model Calibration and Prediction , 2017, SIAM/ASA J. Uncertain. Quantification.

[16]  W. Welch,et al.  Fisher information and maximum‐likelihood estimation of covariance parameters in Gaussian stochastic processes , 1998 .

[17]  Ian Vernon,et al.  Galaxy formation : a Bayesian uncertainty analysis. , 2010 .

[18]  L. Mark Berliner,et al.  Estimating Ocean Circulation: An MCMC Approach With Approximated Likelihoods via the Bernoulli Factory , 2014 .

[19]  P. Burlando,et al.  An advanced stochastic weather generator for simulating 2‐D high‐resolution climate variables , 2017 .

[20]  Jason L. Loeppky,et al.  Batch sequential designs for computer experiments , 2010 .

[21]  William J. Welch,et al.  Screening the Input Variables to a Computer Model Via Analysis of Variance and Visualization , 2006 .

[22]  Barry L. Nelson,et al.  Stochastic kriging for simulation metamodeling , 2008, 2008 Winter Simulation Conference.

[23]  David S. L. Ramsey,et al.  Management of bovine tuberculosis in brushtail possums in New Zealand: predictions from a spatially explicit, individual‐based model , 2010 .

[24]  Michael Ludkovski,et al.  Evaluating Gaussian process metamodels and sequential designs for noisy level set estimation , 2018, Statistics and Computing.

[25]  Jeremy E. Oakley,et al.  Calibration of Stochastic Computer Simulators Using Likelihood Emulation , 2017, Technometrics.

[26]  D. Higdon,et al.  Computer Model Calibration Using High-Dimensional Output , 2008 .

[27]  Jeremy E. Oakley,et al.  Efficient History Matching of a High Dimensional Individual-Based HIV Transmission Model , 2017, SIAM/ASA J. Uncertain. Quantification.

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  H. Rue,et al.  INLA goes extreme: Bayesian tail regression for the estimation of high spatio-temporal quantiles , 2018, Extremes.

[30]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[31]  Victor Picheny,et al.  Noisy kriging-based optimization methods: A unified implementation within the DiceOptim package , 2014, Comput. Stat. Data Anal..

[32]  M. T. Pratola,et al.  Heteroscedastic BART via Multiplicative Regression Trees , 2020 .

[33]  Jerome Sacks,et al.  Choosing the Sample Size of a Computer Experiment: A Practical Guide , 2009, Technometrics.

[34]  Bruce E. Ankenman,et al.  Sliced Full Factorial-Based Latin Hypercube Designs as a Framework for a Batch Sequential Design Algorithm , 2017, Technometrics.

[35]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[36]  Robert B. Gramacy,et al.  Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences , 2020 .

[37]  Dan Cornford,et al.  Bayesian Precalibration of a Large Stochastic Microsimulation Model , 2014, IEEE Transactions on Intelligent Transportation Systems.

[38]  R. Gramacy,et al.  Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models , 2010 .

[39]  A. Raftery,et al.  Inference for Deterministic Simulation Models: The Bayesian Melding Approach , 2000 .

[40]  M. Begon Investigating animal abundance : capture-recapture for biologists , 1979 .

[41]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[42]  I. Jolliffe Principal Component Analysis , 2005 .

[43]  T. J. Mitchell,et al.  Bayesian Prediction of Deterministic Functions, with Applications to the Design and Analysis of Computer Experiments , 1991 .

[44]  Vincent Moutoussamy,et al.  Emulators for stochastic simulation codes , 2014, 1406.6348.

[45]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[46]  James O. Berger,et al.  Automating Emulator Construction for Geophysical Hazard Maps , 2014, SIAM/ASA J. Uncertain. Quantification.

[47]  Marc Hélier,et al.  Kriging the quantile: application to a simple transmission line model , 2002 .

[48]  Dave Higdon,et al.  Combining Field Data and Computer Simulations for Calibration and Prediction , 2005, SIAM J. Sci. Comput..

[49]  Daniel P Weikel,et al.  Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study , 2017, 1702.00261.

[50]  Jeremy E. Oakley,et al.  Bayesian History Matching of Complex Infectious Disease Models Using Emulation: A Tutorial and a Case Study on HIV in Uganda , 2015, PLoS Comput. Biol..

[51]  D. Ginsbourger,et al.  A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.

[52]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[53]  M. Plumlee Bayesian Calibration of Inexact Computer Models , 2017 .

[54]  C. F. Wu,et al.  Efficient Calibration for Imperfect Computer Models , 2015, 1507.07280.

[55]  Songhao Wang,et al.  Enhancing Response Predictions with a Joint Gaussian Process Model for Stochastic Simulation Models , 2020, ACM Trans. Model. Comput. Simul..

[56]  Jian Zhang,et al.  Loss Function Approaches to Predict a Spatial Quantile and Its Exceedance Region , 2008, Technometrics.

[57]  M. J. Bayarri,et al.  Computer model validation with functional output , 2007, 0711.3271.

[58]  R. Feynman,et al.  Space-Time Approach to Non-Relativistic Quantum Mechanics , 1948 .

[59]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[60]  James O. Berger,et al.  Modularization in Bayesian analysis, with emphasis on analysis of computer models , 2009 .

[61]  Robert B. Gramacy,et al.  Classification and Categorical Inputs with Treed Gaussian Process Models , 2009, J. Classif..

[62]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[63]  Montserrat Fuentes,et al.  Estimating the Health Impact of Climate Change With Calibrated Climate Model Output , 2012, Journal of Agricultural, Biological, and Environmental Statistics.

[64]  Adrian E. Raftery,et al.  Probabilistic projections of HIV prevalence using Bayesian melding. , 2007, 0709.0421.

[65]  Z. Wang,et al.  Extended T-process Regression Models , 2015, 1705.05125.

[66]  Jeremy E. Oakley,et al.  Multivariate Gaussian Process Emulators With Nonseparable Covariance Structures , 2013, Technometrics.

[67]  Peter Challenor,et al.  Predicting the Output From a Stochastic Computer Model When a Deterministic Approximation is Available , 2019 .

[68]  Adrian E. Raftery,et al.  Inference from a Deterministic Population Dynamics Model for Bowhead Whales , 1995 .

[69]  Chih-Li Sung,et al.  Calibration of computer models with heteroscedastic errors and application to plant relative growth rates , 2019 .

[70]  Gonzalo García-Donato,et al.  Calibration of computer models with multivariate output , 2012, Comput. Stat. Data Anal..

[71]  Robert B. Gramacy,et al.  Distance-Distributed Design for Gaussian Process Surrogates , 2018, Technometrics.

[72]  M. J. Bayarri,et al.  Predicting Vehicle Crashworthiness: Validation of Computer Models for Functional and Hierarchical Data , 2009 .

[73]  A. Seheult,et al.  Pressure Matching for Hydrocarbon Reservoirs: A Case Study in the Use of Bayes Linear Strategies for Large Computer Experiments , 1997 .

[74]  Pulong Ma,et al.  Computer Model Emulation with High-Dimensional Functional Output in Large-Scale Observing System Uncertainty Experiments , 2019, Technometrics.

[75]  Dan Cornford,et al.  Learning Heteroscedastic Gaussian Processes for Complex Datasets , 2009 .

[76]  Xi Chen,et al.  Stochastic kriging with qualitative factors , 2013, 2013 Winter Simulations Conference (WSC).

[77]  Mickaël Binois,et al.  Parameter and Uncertainty Estimation for Dynamical Systems Using Surrogate Stochastic Processes , 2018, 1802.00852.

[78]  Leah R Johnson,et al.  Parameter inference for an individual based model of chytridiomycosis in frogs. , 2010, Journal of theoretical biology.

[79]  Montserrat Fuentes,et al.  Model Evaluation and Spatial Interpolation by Bayesian Combination of Observations with Outputs from Numerical Models , 2005, Biometrics.

[80]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[81]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[82]  Madhav V. Marathe,et al.  EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems , 2009, ICS.

[83]  Bruce E. Ankenman,et al.  GRADIENT BASED CRITERIA FOR SEQUENTIAL DESIGN , 2018, 2018 Winter Simulation Conference (WSC).

[84]  K. Axhausen,et al.  Reconstructing the 2003/2004 H3N2 influenza epidemic in Switzerland with a spatially explicit, individual-based model , 2011, BMC infectious diseases.

[85]  M. D. McKay,et al.  A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , 2000 .

[86]  Chiwoo Park,et al.  Patchwork Kriging for Large-scale Gaussian Process Regression , 2017, J. Mach. Learn. Res..

[87]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[88]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[89]  Birgit Müller,et al.  A standard protocol for describing individual-based and agent-based models , 2006 .

[90]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[91]  Shiyu Zhou,et al.  A Simple Approach to Emulation for Computer Models With Qualitative and Quantitative Factors , 2011, Technometrics.

[92]  Wei Xie,et al.  Asymmetric kriging emulator for stochastic simulation , 2017, 2017 Winter Simulation Conference (WSC).

[93]  Leah R. Johnson,et al.  Implications of dispersal and life history strategies for the persistence of Linyphiid spider populations , 2009, 0908.2778.

[94]  Adrian E. Raftery,et al.  Assessing Uncertainty in Urban Simulations Using Bayesian Melding , 2007 .

[95]  Bertrand Iooss,et al.  Global sensitivity analysis of stochastic computer models with joint metamodels , 2008, Statistics and Computing.

[96]  Robert B. Gramacy,et al.  tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models , 2007 .

[97]  C. W. Richardson Stochastic simulation of daily precipitation, temperature, and solar radiation , 1981 .

[98]  Mohamed S. Ebeida,et al.  VPS: VORONOI PIECEWISE SURROGATE MODELS FOR HIGH-DIMENSIONAL DATA FITTING , 2017 .

[99]  François Bachoc,et al.  Nested Kriging predictions for datasets with a large number of observations , 2016, Statistics and Computing.

[100]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[101]  Lee W. Schruben,et al.  History of improving statistical efficiency , 2017, 2017 Winter Simulation Conference (WSC).

[102]  J. Sacks,et al.  Predicting Urban Ozone Levels and Trends with Semiparametric Modeling , 1996 .

[103]  Victor Picheny,et al.  Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise , 2017, Eur. J. Oper. Res..

[104]  Eric Walter,et al.  Global optimization based on noisy evaluations: An empirical study of two statistical approaches , 2008 .

[105]  James O. Berger,et al.  Statistical Inverse Analysis for a Network Microsimulator , 2005, Technometrics.

[106]  Robert B. Gramacy,et al.  Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments , 2016, Journal of Computational and Graphical Statistics.

[107]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[108]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[109]  Jerome Sacks,et al.  Integrated circuit design optimization using a sequential strategy , 1992, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[110]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[111]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[112]  Peter Z. G. Qian,et al.  Gaussian Process Models for Computer Experiments With Qualitative and Quantitative Factors , 2008, Technometrics.

[113]  F. Pukelsheim The Three Sigma Rule , 1994 .

[114]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[115]  M. Kac On distributions of certain Wiener functionals , 1949 .

[116]  Dan Cornford,et al.  Optimal design for correlated processes with input-dependent noise , 2014, Comput. Stat. Data Anal..

[117]  James M Salter,et al.  Uncertainty Quantification for Computer Models With Spatial Output Using Calibration-Optimal Bases , 2018 .

[118]  Rui Tuo,et al.  Building Accurate Emulators for Stochastic Simulations via Quantile Kriging , 2014, Technometrics.

[119]  Darren J. Wilkinson,et al.  Bayesian Emulation and Calibration of a Stochastic Computer Model of Mitochondrial DNA Deletions in Substantia Nigra Neurons , 2009 .

[120]  Philip J. Radtke,et al.  Bayesian melding of a forest ecosystem model with correlated inputs , 2002 .

[121]  Ilya M. Sobol,et al.  Sensitivity Estimates for Nonlinear Mathematical Models , 1993 .

[122]  Jonathan Ozik,et al.  MICROSIMULATION MODEL CALIBRATION USING INCREMENTAL MIXTURE APPROXIMATE BAYESIAN COMPUTATION. , 2018, The annals of applied statistics.

[123]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[124]  Youngdeok Hwang,et al.  Synthesizing simulation and field data of solar irradiance , 2018, Stat. Anal. Data Min..

[125]  Jeremy E. Oakley,et al.  Approximate Bayesian Computation and simulation based inference for complex stochastic epidemic models , 2018 .

[126]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[127]  Madhav Marathe,et al.  Calibrating a Stochastic, Agent-Based Model Using Quantile-Based Emulation , 2017, SIAM/ASA J. Uncertain. Quantification.

[128]  Geoff K. Nicholls,et al.  Statistical inversion of South Atlantic circulation in an abyssal neutral density layer , 2005 .

[129]  Noel Cressie,et al.  Multivariate Spatial Data Fusion for Very Large Remote Sensing Datasets , 2017, Remote. Sens..

[130]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[131]  Luc Pronzato,et al.  Design of computer experiments: space filling and beyond , 2011, Statistics and Computing.

[132]  Xi Chen,et al.  A heteroscedastic T-process simulation metamodeling approach and its application in inventory control and optimization , 2017, 2017 Winter Simulation Conference (WSC).

[133]  Wei Chen,et al.  A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors , 2018, Technometrics.