GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs

Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.

[1]  A. Raftery,et al.  Inference for Deterministic Simulation Models: The Bayesian Melding Approach , 2000 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[4]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[5]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[6]  Jerome Sacks,et al.  Choosing the Sample Size of a Computer Experiment: A Practical Guide , 2009, Technometrics.

[7]  Giuseppe De Nicolao,et al.  Efficient Marginal Likelihood Computation for Gaussian Process Regression , 2011 .

[8]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[9]  Kurt Hornik,et al.  Escaping RGBland: Selecting colors for statistical graphics , 2009, Comput. Stat. Data Anal..

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[12]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[13]  P. Siarry,et al.  An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization , 2000 .

[14]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[15]  Bogdan Filipic,et al.  Optimization of Gaussian Process Models with Evolutionary Algorithms , 2011, ICANNGA.

[16]  R. Carnell Latin Hypercube Samples , 2016 .

[17]  Pritam Ranjan,et al.  A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data , 2010, Technometrics.

[18]  Tao Yu,et al.  Reliable multi-objective optimization of high-speed WEDM process based on Gaussian process regression , 2008 .

[19]  A. J. Booker,et al.  A rigorous framework for optimization of expensive functions by surrogates , 1998 .

[20]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[21]  Robert B. Gramacy,et al.  tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models , 2007 .

[22]  M. Stein Large sample properties of simulations using latin hypercube sampling , 1987 .

[23]  Ronald D. Haynes,et al.  Assessment of tidal current energy in the Minas Passage, Bay of Fundy , 2008 .

[24]  Robert B. Gramacy,et al.  Cases for the nugget in modeling computer experiments , 2010, Statistics and Computing.

[25]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[26]  Karin S. Dorman,et al.  mlegp: statistical analysis for computer models of biological systems using R , 2008, Bioinform..

[27]  Alexandre Arbey Dark fluid: A complex scalar field to unify dark energy and dark matter , 2006 .

[28]  Javier J. Sánchez Medina,et al.  Stochastic Vs Deterministic Traffic Simulator. Comparative Study for Its Use Within a Traffic Light Cycles Optimization Architecture , 2005, IWINAC.