Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models

Complex, mechanistic hydrological models can be computationally expensive, have large numbers of input parameters, and generate multivariate output. Model emulators can be constructed to approximate these complex models with substantial computational savings, making activities such as sensitivity analysis, calibration and uncertainty analysis feasible. Success in the use of an emulator relies on it making accurate and precise predictions of the model output. However, it is often unclear what type of emulation approach will be suitable. We present a comparison of reduced-rank, multivariate emulators built upon different ‘emulation engines’ and apply them to the Australian Water Resource Assessment System model. We examine first-order and second-order approaches which focus on specifying the mean and covariance, respectively. We also introduce a nonparametric approach for quantifying the uncertainty associated with the emulated prediction where this has bounded support. Our results demonstrate that emulation engines based on second-order approaches, such as Gaussian processes, can be computationally burdensome and may be comparable in performance to computationally efficient, first-order methods such as random forests.Supplementary materials accompanying this paper appear online.

[1]  Robert B. Gramacy,et al.  Gaussian processes and limiting linear models , 2008, Comput. Stat. Data Anal..

[2]  Jun Xia,et al.  An efficient integrated approach for global sensitivity analysis of hydrological model parameters , 2013, Environ. Model. Softw..

[3]  Jin Teng,et al.  The Australian Water Resource Assessment Modelling System (AWRA) , 2013 .

[4]  Thierry Alex Mara,et al.  Comparison of some efficient methods to evaluate the main effect of computer model factors , 2008 .

[5]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[6]  Peter Reichert,et al.  Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator , 2016, Environ. Model. Softw..

[7]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[8]  Zhengdong Lu,et al.  Fast neural network surrogates for very high dimensional physics-based models in computational oceanography , 2007, Neural Networks.

[9]  Christopher K. Wikle,et al.  Modeling 3‐D spatio‐temporal biogeochemical processes with a forest of 1‐D statistical emulators , 2013 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Christopher K. Wikle,et al.  Modern perspectives on statistics for spatio‐temporal data , 2015 .

[12]  Bryan A. Tolson,et al.  Review of surrogate modeling in water resources , 2012 .

[13]  J. Rougier Efficient Emulators for Multivariate Deterministic Functions , 2008 .

[14]  J. Vaze,et al.  AWRA-L v4.5: Technical description of model algorithms and inputs , 2015 .

[15]  Jon C. Helton,et al.  Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models , 2009, Reliab. Eng. Syst. Saf..

[16]  S. Sparnocchia,et al.  Multivariate Empirical Orthogonal Function analysis of the upper thermocline structure of the Mediterranean Sea from observations and model simulations , 2003 .

[17]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[18]  Mevin B. Hooten,et al.  Assessing First-Order Emulator Inference for Physical Parameters in Nonlinear Mechanistic Models , 2011 .

[19]  Bruno Sudret,et al.  Global sensitivity analysis using polynomial chaos expansions , 2008, Reliab. Eng. Syst. Saf..

[20]  Prabhat,et al.  Parallelizing Gaussian Process Calculations in R , 2013, ArXiv.

[21]  Alex J. Cannon,et al.  Statistical emulation of streamflow projections from a distributed hydrological model: Application to CMIP3 and CMIP5 climate projections for British Columbia, Canada , 2014 .

[22]  E. Bruce Pitman,et al.  Computational Statistics and Data Analysis Mechanism-based Emulation of Dynamic Simulation Models: Concept and Application in Hydrology , 2022 .

[23]  D. Higdon,et al.  Computer Model Calibration Using High-Dimensional Output , 2008 .

[24]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[25]  António M. Baptista,et al.  Author's Personal Copy Dynamics of Atmospheres and Oceans Fast Data Assimilation Using a Nonlinear Kalman Filter and a Model Surrogate: an Application to the Columbia River Estuary , 2022 .

[26]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[27]  Michael A. West,et al.  A dynamic modelling strategy for Bayesian computer model emulation , 2009 .

[28]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[29]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[30]  Nadine Dessay,et al.  SPODT: An R Package to Perform Spatial Partitioning , 2015 .

[31]  Wenxi Lu,et al.  Comparison of surrogate models with different methods in groundwater remediation process , 2014, Journal of Earth System Science.

[32]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[33]  R. Preisendorfer,et al.  Principal Component Analysis in Meteorology and Oceanography , 1988 .

[34]  Anthony J. Jakeman,et al.  A review of surrogate models and their application to groundwater modeling , 2015 .

[35]  Robert B. Gramacy,et al.  tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models , 2007 .

[36]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[37]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[38]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[39]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[40]  Jeremy E. Oakley,et al.  Estimating Multiparameter Partial Expected Value of Perfect Information from a Probabilistic Sensitivity Analysis Sample , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[41]  David Clifford,et al.  Simple approach to emulating complex computer models for global sensitivity analysis , 2015, Environ. Model. Softw..

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Christopher K. Wikle,et al.  Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes , 2014 .

[44]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[45]  A. OHagan,et al.  Bayesian analysis of computer code outputs: A tutorial , 2006, Reliab. Eng. Syst. Saf..