Batch-sequential design and heteroskedastic surrogate modeling for delta smelt conservation

Delta smelt is an endangered fish species in the San Francisco estuary that have shown an overall population decline over the past 30 years. Researchers have developed a stochastic, agent-based simulator to virtualize the system, with the goal of understanding the relative contribution of natural and anthropogenic factors suggested as playing a role in their decline. However, the input configuration space is high-dimensional, running the simulator is time-consuming, and its noisy outputs change nonlinearly in both mean and variance. Getting enough runs to effectively learn input--output dynamics requires both a nimble modeling strategy and parallel supercomputer evaluation. Recent advances in heteroskedastic Gaussian process (HetGP) surrogate modeling helps, but little is known about how to appropriately plan experiments for highly distributed simulator evaluation. We propose a batch sequential design scheme, generalizing one-at-a-time variance-based active learning for HetGP surrogates, as a means of keeping multi-core cluster nodes fully engaged with expensive runs. Our acquisition strategy is carefully engineered to favor selection of replicates which boost statistical and computational efficiencies when training surrogates to isolate signal in high noise regions. Design and modeling performance is illustrated on a range of toy examples before embarking on a large-scale smelt simulation campaign and downstream high-fidelity input sensitivity analysis.

[1]  V. R. Joseph,et al.  Adaptive design for Gaussian process regression under censoring , 2019, The Annals of Applied Statistics.

[2]  Robert B. Gramacy,et al.  Locally induced Gaussian processes for large-scale simulation experiments , 2020, Statistics and Computing.

[3]  Stefan M. Wild,et al.  Sequential Learning of Active Subspaces , 2019, J. Comput. Graph. Stat..

[4]  Michael Ludkovski,et al.  Evaluating Gaussian process metamodels and sequential designs for noisy level set estimation , 2018, Statistics and Computing.

[5]  On the Approximation Properties of q − Analogue Bivariate λ -Bernstein Type Operators , 2020 .

[6]  P. Moyle,et al.  Comparing and Integrating Fish Surveys in the San Francisco Estuary: Why Diverse Long-Term Monitoring Programs are Important , 2020, San Francisco Estuary and Watershed Science.

[7]  D. Higdon,et al.  Stochastic Simulators: An Overview with Opportunities , 2020, 2002.01321.

[8]  Robert B. Gramacy,et al.  Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences , 2020 .

[9]  Xinwei Deng,et al.  An efficient algorithm for Elastic I‐optimal design of generalized linear models , 2018, Canadian Journal of Statistics.

[10]  Jonathan Ozik,et al.  MICROSIMULATION MODEL CALIBRATION USING INCREMENTAL MIXTURE APPROXIMATE BAYESIAN COMPUTATION. , 2018, The annals of applied statistics.

[11]  Mike Ludkovski,et al.  Replication or Exploration? Sequential Design for Stochastic Simulation Experiments , 2017, Technometrics.

[12]  Bruce E. Ankenman,et al.  GRADIENT BASED CRITERIA FOR SEQUENTIAL DESIGN , 2018, 2018 Winter Simulation Conference (WSC).

[13]  Thomas J. Santner,et al.  Computer experiment designs for accurate prediction , 2018, Stat. Comput..

[14]  Joseph A. C. Delaney Sensitivity analysis , 2018, The African Continental Free Trade Area: Economic and Distributional Effects.

[15]  D. Murphy,et al.  Analysis of Limiting Factors Across the Life Cycle of Delta Smelt (Hypomesus transpacificus) , 2018, Environmental Management.

[16]  Xinwei Deng,et al.  EI-Optimal Design: An Efficient Algorithm for Elastic I-optimal Design of Generalized Linear Models , 2018 .

[17]  Madhav Marathe,et al.  Calibrating a Stochastic, Agent-Based Model Using Quantile-Based Emulation , 2017, SIAM/ASA J. Uncertain. Quantification.

[18]  Robert B. Gramacy,et al.  Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments , 2016, Journal of Computational and Graphical Statistics.

[19]  K. Rose,et al.  Individual-Based Modeling of Delta Smelt Population Dynamics in the Upper San Francisco Estuary III. Effects of Entrainment Mortality and Changes in Prey , 2018 .

[20]  Bruce E. Ankenman,et al.  Sliced Full Factorial-Based Latin Hypercube Designs as a Framework for a Batch Sequential Design Algorithm , 2017, Technometrics.

[21]  Henrik Bengtsson,et al.  Read and Write MAT Files and Call MATLAB from Within R , 2016 .

[22]  Peter B. Moyle,et al.  Delta smelt: Life history and decline of a once-abundant species in the San Francisco estuary , 2016 .

[23]  L. Brown,et al.  An updated conceptual model of Delta Smelt biology: Our evolving understanding of an estuarine fish , 2015 .

[24]  S. Conti,et al.  Bayesian Emulation and Calibration of a Dynamic Epidemic Model for A/H1N1 Influenza , 2014 .

[25]  L. Mark Berliner,et al.  Estimating Ocean Circulation: An MCMC Approach With Approximated Likelihoods via the Bernoulli Factory , 2014 .

[26]  C. Chevalier Fast uncertainty reduction strategies relying on Gaussian process models , 2013 .

[27]  K. Rose,et al.  Individual-Based Modeling of Delta Smelt Population Dynamics in the Upper San Francisco Estuary: II. Alternative Baselines and Good versus Bad Years , 2013 .

[28]  K. Rose,et al.  Individual-Based Modeling of Delta Smelt Population Dynamics in the Upper San Francisco Estuary: I. Model Description and Baseline Results , 2013 .

[29]  Peter I. Frazier,et al.  Optimization of computationally expensive simulations with Gaussian processes and parameter uncertainty: Application to cardiovascular surgery , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  W. J. Miller,et al.  An Investigation of Factors Affecting the Decline of Delta Smelt (Hypomesus transpacificus) in the Sacramento-San Joaquin Estuary , 2012 .

[31]  Mark N. Maunder,et al.  A state-space multistage life cycle model to evaluate population impacts in the presence of density dependence: illustrated with application to delta smelt (Hyposmesus transpacificus) , 2011 .

[32]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[33]  Jason L. Loeppky,et al.  Batch sequential designs for computer experiments , 2010 .

[34]  R. Gramacy,et al.  Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models , 2010 .

[35]  Gonzalo Castillo,et al.  Analysis of pelagic species decline in the upper San Francisco Estuary using multivariate autoregressive modeling (MAR). , 2009, Ecological applications : a publication of the Ecological Society of America.

[36]  L. Brown,et al.  Bayesian change point analysis of abundance trends for pelagic fishes in the upper San Francisco Estuary. , 2009, Ecological applications : a publication of the Ecological Society of America.

[37]  D. Ginsbourger,et al.  Towards Gaussian Process-based Optimization with Finite Time Horizon , 2010 .

[38]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[39]  Herbert K. H. Lee,et al.  Bayesian Guided Pattern Search for Robust Local Optimization , 2009, Technometrics.

[40]  Robert B. Gramacy,et al.  Particle Learning of Gaussian Process Models for Sequential Design and Optimization , 2009, 0909.5262.

[41]  Madhav V. Marathe,et al.  EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems , 2009, ICS.

[42]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[43]  Olivier Roustant,et al.  Calculations of Sobol indices for the Gaussian process metamodel , 2008, Reliab. Eng. Syst. Saf..

[44]  Barry L. Nelson,et al.  Stochastic kriging for simulation metamodeling , 2008, 2008 Winter Simulation Conference.

[45]  Leah R Johnson,et al.  Microcolony and biofilm formation as a survival strategy for bacteria. , 2006, Journal of theoretical biology.

[46]  Robert B. Gramacy,et al.  tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models , 2007 .

[47]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[48]  Barry L. Nelson,et al.  Discrete Optimization via Simulation Using COMPASS , 2006, Oper. Res..

[49]  Geoff K. Nicholls,et al.  Statistical inversion of South Atlantic circulation in an abyssal neutral density layer , 2005 .

[50]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[51]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[52]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[53]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[54]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[55]  Scott L. Zeger,et al.  comments and a rejoinder by the authors) , 2000 .

[56]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[57]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[58]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[59]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[60]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[61]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[62]  T. J. Mitchell,et al.  Exploratory designs for computational experiments , 1995 .

[63]  M. E. Johnson,et al.  Minimax and maximin distance designs , 1990 .

[64]  Stephen Barnett,et al.  Matrix Methods for Engineers and Scientists , 1982 .