Optimal design for correlated processes with input-dependent noise

Optimal design for parameter estimation in Gaussian process regression models with input-dependent noise is examined. The motivation stems from the area of computer experiments, where computationally demanding simulators are approximated using Gaussian process emulators to act as statistical surrogates. In the case of stochastic simulators, which produce a random output for a given set of model inputs, repeated evaluations are useful, supporting the use of replicate observations in the experimental design. The findings are also applicable to the wider context of experimental design for Gaussian process regression and kriging. Designs are proposed with the aim of minimising the variance of the Gaussian process parameter estimates. A heteroscedastic Gaussian process model is presented which allows for an experimental design technique based on an extension of Fisher information to heteroscedastic models. It is empirically shown that the error of the approximation of the parameter variance by the inverse of the Fisher information is reduced as the number of replicated points is increased. Through a series of simulation experiments on both synthetic data and a systems biology stochastic simulator, optimal designs with replicate observations are shown to outperform space-filling designs both with and without replicate observations. Guidance is provided on best practice for optimal experimental design for stochastic response models.

[1]  Milan Stehlík,et al.  Equidistant and D-optimal designs for parameters of Ornstein–Uhlenbeck process ☆ , 2008 .

[2]  D. Zimmerman,et al.  Towards reconciling two asymptotic frameworks in spatial statistics , 2005 .

[3]  Soora Rasouli,et al.  Using emulators to approximate predicted performance indicators of complex microsimulation and multiagent models of travel demand , 2013 .

[4]  R. A. Montgomery,et al.  Space and Beyond , 1980 .

[5]  Holger Dette,et al.  A New Approach to Optimal Design for Linear Models With Correlated Observations , 2010, 1303.2863.

[6]  W. Näther Optimum experimental designs , 1994 .

[7]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[8]  Darren J. Wilkinson,et al.  Bayesian Emulation and Calibration of a Stochastic Computer Model of Mitochondrial DNA Deletions in Substantia Nigra Neurons , 2009 .

[9]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[10]  Zhengyuan Zhu,et al.  Spatial sampling design for parameter estimation of the covariance function , 2005 .

[11]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[12]  Dale L. Zimmerman,et al.  Optimal network design for spatial prediction, covariance parameter estimation, and empirical prediction , 2006 .

[13]  Alan E. Gelfand,et al.  Approximately optimal spatial design approaches for environmental health data , 2006 .

[14]  Milan Stehlík,et al.  Issues in the optimal design of computer simulation experiments , 2009 .

[15]  Milan Stehlík,et al.  Compound optimal spatial designs , 2009 .

[16]  Anthony O'Hagan,et al.  Diagnostics for Gaussian Process Emulators , 2009, Technometrics.

[17]  Andrej Pazman Correlated optimum design with parametrized covariance function. Justification of the Fisher information matrix and of the method of virtual noise. , 2004 .

[18]  W. G. Müller,et al.  Optimal designs for variogram estimation , 1999 .

[19]  J. Eccleston,et al.  Compound Optimal Design Criteria for Nonlinear Models , 2008, Journal of biopharmaceutical statistics.

[20]  Peter Goos,et al.  Efficient Bayesian designs under heteroscedasticity , 2002 .

[21]  S. Griffis EDITOR , 1997, Journal of Navigation.

[22]  Bryan F. J. Manly,et al.  Maximum likelihood estimation of models , 1990 .

[23]  Valerii V. Fedorov,et al.  Optimum Design for Correlated Fields via Covariance Kernel Expansions , 2007 .

[24]  T. C. Haas,et al.  Model-based geostatistics. Discussion. Authors' reply , 1998 .

[25]  W. Welch,et al.  Fisher information and maximum‐likelihood estimation of covariance parameters in Gaussian stochastic processes , 1998 .

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  M. Stein Statistical Interpolation of Spatial Data: Some Theory for Kriging , 1999 .

[28]  P. Laycock,et al.  Optimum Experimental Designs , 1995 .

[29]  Luc Pronzato,et al.  Design of computer experiments: space filling and beyond , 2011, Statistics and Computing.

[30]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[31]  R. Green,et al.  Sampling Design and Statistical Methods for Environmental Biologists , 1979 .

[32]  A. B. Antognini,et al.  Exact optimal designs for computer experiments via Kriging metamodelling , 2010 .

[33]  Anthony N. Pettitt,et al.  Sampling Designs for Estimating Spatial Variance Components , 1993 .

[34]  Michael Goldstein,et al.  A Bayes Linear approach to systems biology. , 2010 .

[35]  Nizam Uddin,et al.  MV-optimal block designs for correlated errors , 2008 .

[36]  Milan Stehlík,et al.  On the optimal designs for the prediction of Ornstein–Uhlenbeck sheets , 2013 .

[37]  Alexis Boukouvalas,et al.  Emulation of random output simulators , 2010 .

[38]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[39]  M. Stein,et al.  Spatial sampling design for prediction with estimated parameters , 2006 .

[40]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[41]  Andrej Pázman,et al.  Criteria for optimal design of small-sample experiments with correlated observations , 2007, Kybernetika.

[42]  Henry P. Wynn,et al.  [Design and Analysis of Computer Experiments]: Rejoinder , 1989 .

[43]  S. Ounpraseuth,et al.  Gaussian Processes for Machine Learning , 2008 .

[44]  Milan Stehlík,et al.  Filling and D-optimal Designs for the Correlated Generalized Exponential Model , 2012 .