Cross-Validation-based Adaptive Sampling for Gaussian Process Models

In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO error ($ESE_{LOO}$) is calculated at each design point. $ESE_{LOO}$ is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples in the nearby region could improve the accuracy of the GP model. As a result, it is reasonable to select the next sample where $ESE_{LOO}$ is maximum. However, such quantity is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the $ESE_{LOO}$s and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI and allows us to discover unexplored regions. The results show that the proposed sampling method is promising.

[1]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[2]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[3]  Serge Guillas,et al.  Sequential Design with Mutual Information for Computer Experiments (MICE): Emulation of a Tsunami Model , 2014, SIAM/ASA J. Uncertain. Quantification.

[4]  Ying Ma,et al.  An Adaptive Bayesian Sequential Sampling Approach for Global Metamodeling , 2016 .

[5]  William I. Notz,et al.  Sequential adaptive designs in computer experiments for response surface model fit , 2008 .

[6]  Timothy W. Simpson,et al.  Metamodels for Computer-based Engineering Design: Survey and recommendations , 2001, Engineering with Computers.

[7]  D. Williamson,et al.  Exploratory ensemble designs for environmental models using k-extended Latin Hypercubes , 2015, Environmetrics.

[8]  George E. P. Box,et al.  The 2 k — p Fractional Factorial Designs Part II. , 1961 .

[9]  W. F. Caselton,et al.  Optimal monitoring network designs , 1984 .

[10]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[11]  Haitao Liu,et al.  A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design , 2017, Structural and Multidisciplinary Optimization.

[12]  John C. Brigham,et al.  Efficient global sensitivity analysis for flow-induced vibration of a nuclear reactor assembly using Kriging surrogates , 2019, Nuclear Engineering and Design.

[13]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14]  R. Haftka,et al.  Multifidelity Surrogate Based on Single Linear Regression , 2017, AIAA Journal.

[15]  Dishi Liu,et al.  Quantification of Airfoil Geometry-Induced Aerodynamic Uncertainties - Comparison of Approaches , 2015, SIAM/ASA J. Uncertain. Quantification.

[16]  Henry P. Wynn,et al.  Maximum entropy sampling , 1987 .

[17]  Saman Razavi,et al.  Progressive Latin Hypercube Sampling: An efficient approach for robust sampling-based analysis of environmental models , 2017, Environ. Model. Softw..

[18]  Dirk Gorissen,et al.  A Novel Hybrid Sequential Design Strategy for Global Surrogate Modeling of Computer Experiments , 2011, SIAM J. Sci. Comput..

[19]  William J. Welch,et al.  Computer experiments and global optimization , 1997 .

[20]  Hossein Mohammadi Kriging-based black-box global optimization : analysis and new algorithms , 2016 .

[21]  Shapour Azarm,et al.  An accumulative error based adaptive design of experiments for offline metamodeling , 2009 .

[22]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[23]  J. Stuart Hunter,et al.  The 2 k—p Fractional Factorial Designs Part I , 2000, Technometrics.

[24]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[25]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[26]  Daniel B. Williamson,et al.  Diagnostics-Driven Nonstationary Emulators Using Kernel Mixtures , 2018, SIAM/ASA J. Uncertain. Quantification.

[27]  Xin-She Yang,et al.  A literature survey of benchmark functions for global optimisation problems , 2013, Int. J. Math. Model. Numer. Optimisation.

[28]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[29]  Peter Z. G. Qian,et al.  Accurate emulators for large-scale computer experiments , 2011, 1203.2433.

[30]  Junli Liu,et al.  Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions , 2016, BMC Systems Biology.

[31]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[32]  Iftekhar A. Karimi,et al.  Design of computer experiments: A review , 2017, Comput. Chem. Eng..

[33]  Malek Ben Salem,et al.  Universal Prediction Distribution for Surrogate Models , 2015, SIAM/ASA J. Uncertain. Quantification.

[34]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[35]  Olivier Dubrule,et al.  Cross validation of kriging in a unique neighborhood , 1983 .

[36]  Yuansheng Cheng,et al.  Pseudo expected improvement criterion for parallel EGO algorithm , 2017, J. Glob. Optim..

[37]  Claire Cannamela,et al.  Kriging-based sequential design strategies using fast cross-validation techniques with extensions to multi-fidelity computer codes , 2012, 1210.6187.

[38]  M. E. Johnson,et al.  Minimax and maximin distance designs , 1990 .

[39]  Victor Picheny,et al.  Adaptive Designs of Experiments for Accurate Approximation of a Target Region , 2010 .

[40]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[41]  V. Roshan Joseph,et al.  Space-filling designs for computer experiments: A review , 2016 .

[42]  Reinhard Radermacher,et al.  Cross-validation based single response adaptive design of experiments for Kriging metamodeling of deterministic computer simulations , 2013 .

[43]  J. S. Hunter,et al.  The 2 k—p Fractional Factorial Designs Part I , 2000, Technometrics.

[44]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[45]  Ruichen Jin,et al.  On Sequential Sampling for Global Metamodeling in Engineering Design , 2002, DAC 2002.

[46]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[47]  Luc Pronzato,et al.  Design of computer experiments: space filling and beyond , 2011, Statistics and Computing.

[48]  Wolfgang Ponweiser,et al.  Clustered multiple generalized expected improvement: A novel infill sampling criterion for surrogate models , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[49]  Céline Helbert,et al.  DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments , 2015 .

[50]  David M. Steinberg,et al.  Modeling Data from Computer Experiments: An Empirical Comparison of Kriging with MARS and Projection Pursuit Regression , 2007 .

[51]  Jay D. Martin,et al.  USE OF ADAPTIVE METAMODELING FOR DESIGN OPTIMIZATION , 2002 .

[52]  OngYew-Soon,et al.  A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design , 2018 .