Expected improvement versus predicted value in surrogate-based optimization

Surrogate-based optimization relies on so-called infill criteria (acquisition functions) to decide which point to evaluate next. When Kriging is used as the surrogate model of choice (also called Bayesian optimization), one of the most frequently chosen criteria is expected improvement. We argue that the popularity of expected improvement largely relies on its theoretical properties rather than empirically validated performance. Few results from the literature show evidence, that under certain conditions, expected improvement may perform worse than something as simple as the predicted value of the surrogate model. We benchmark both infill criteria in an extensive empirical study on the 'BBOB' function set. This investigation includes a detailed study of the impact of problem dimensionality on algorithm performance. The results support the hypothesis that exploration loses importance with increasing problem dimensionality. A statistical analysis reveals that the purely exploitative search with the predicted value criterion performs better on most problems of five or higher dimensions. Possible reasons for these results are discussed. In addition, we give an in-depth guide for choosing the infill criteria based on prior knowledge about the problem at hand, its dimensionality, and the available budget.

[1]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[2]  Cheng Li,et al.  Budgeted Batch Bayesian Optimization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[3]  Hao Wang,et al.  Cooling Strategies for the Moment-Generating Function in Bayesian Global Optimization , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[4]  Eric Chicken,et al.  Nonparametric Statistical Methods: Hollander/Nonparametric Statistical Methods , 1973 .

[5]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[6]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[7]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[8]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[9]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[10]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[11]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[12]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[13]  Thomas Bartz-Beielstein,et al.  Sequential parameter optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[14]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[15]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[16]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[17]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[18]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[19]  Anne Auger,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions , 2009 .

[20]  Jakob Bossek,et al.  smoof: Single- and Multi-Objective Optimization Test Functions , 2017, R J..

[21]  Hao Wang,et al.  Towards self-adaptive efficient global optimization , 2019 .

[22]  Joao P. P. Gomes,et al.  No-PASt-BO: Normalized Portfolio Allocation Strategy for Bayesian Optimization , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[24]  Karen Willcox,et al.  Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach , 2016, NIPS.

[25]  Mike Preuss,et al.  The true destination of EGO is multi-local optimization , 2017, 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[26]  D. Ginsbourger,et al.  Towards Gaussian Process-based Optimization with Finite Time Horizon , 2010 .

[27]  Jonathan E. Fieldsend,et al.  Greed Is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation , 2019, ACM Trans. Evol. Learn. Optim..

[28]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[29]  Dirk Husmeier,et al.  On a New Improvement-Based Acquisition Function for Bayesian Optimization , 2018, ArXiv.

[30]  Hao Wang,et al.  Time complexity reduction in efficient global optimization using cluster kriging , 2017, GECCO.

[31]  M. D. McKay,et al.  A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , 2000 .

[32]  R JonesDonald,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998 .

[33]  Hao Wang,et al.  Optimally Weighted Cluster Kriging for Big Data Regression , 2015, IDA.