A new acquisition function for Bayesian optimization based on the moment-generating function

Bayesian Optimization or Efficient Global Optimization (EGO) is a global search strategy that is designed for expensive black-box functions. In this algorithm, a statistical model (usually the Gaussian process model) is constructed on some initial data samples. The global optimum is approached by iteratively maximizing a so-called acquisition function, that balances the exploration and exploitation effect of the search. The performance of such an algorithm is largely affected by the choice of the acquisition function. Inspired by the usage of higher moments from the Gaussian process model, it is proposed to construct a novel acquisition function based on the moment-generating function (MGF) of the improvement, which is the stochastic gain over the current best fitness value by sampling at an unknown point. This MGF-based acquisition function takes all the higher moments into account and introduces an additional real-valued parameter to control the trade-off between exploration and exploitation. The motivation, rationale and closed-form expression of the proposed function are discussed in detail. In addition, we also illustrate its advantage over other acquisition functions, especially the so-called generalized expected improvement.

[1]  Bernd Bischl,et al.  MOI-MBO: Multiobjective Infill for Parallel Model-Based Optimization , 2014, LION.

[2]  Hao Wang,et al.  Multi-point Efficient Global Optimization Using Niching Evolution Strategy , 2015, EVOLVE.

[3]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[4]  Antanas Zilinskas,et al.  A review of statistical models for global optimization , 1992, J. Glob. Optim..

[5]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[6]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[7]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[8]  M. Sasena,et al.  Exploration of Metamodeling Sampling Criteria for Constrained Global Optimization , 2002 .

[9]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[10]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[11]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[12]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[13]  Hao Wang,et al.  Balancing risk and expected gain in kriging-based global optimization , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[14]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[15]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[16]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[17]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[18]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[19]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[20]  Wolfgang Ponweiser,et al.  Clustered multiple generalized expected improvement: A novel infill sampling criterion for surrogate models , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[21]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[22]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[23]  D. Krige A statistical approach to some basic mine valuation problems on the Witwatersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the author , 1951 .

[24]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[25]  Rasmus K. Ursem,et al.  From Expected Improvement to Investment Portfolio Improvement: Spreading the Risk in Kriging-Based Optimization , 2014, PPSN.

[26]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[27]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[28]  Thomas J. Santner,et al.  Sequential design of computer experiments to minimize integrated response functions , 2000 .