Initialization of Bayesian Optimization Viewed as Part of a Larger Algorithm Portfolio

We consider the problem of setting the hyperparameters of one or more surrogate-assisted evolutionary optimization algorithms, and similar methods such as Bayesian Optimization (i.e. Gaussian Process regression combined with an acquisition function for choosing next solutions to sample), which are often used for problems with expensive function evaluations. It has been remarked elsewhere that these algorithms can be started with an initial experimental design, or with a random sample, or by starting from only two points. We here investigate how to make such choices. By equating the use of random search for the initial population (or design) with running a single-point random searcher for some few initial samples (and extending this view to other initialization methods) it seems clear that these methods (initialization + Bayesian optimizer) are rather like an algorithm portfolio, and something can be learned from that literature. However, we start largely afresh with experiments that combine different sampling methods (random search, Latin hypercube) with Gaussian processes and different acquisition functions (expected improvement, and a generalisation of it). We consider a number of different experimental setups (functions and dimensions), and attempt to get a rough view of what works well where. Our work complements some previous Evolutionary Computation work on initialization using subrandom sequences, and experimental designs, but considers more modern algorithms for expensive problems.

[1]  Thomas Bartz-Beielstein,et al.  Considerations of Budget Allocation for Sequential Parameter Optimization (SPO) , 2006 .

[2]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[3]  Kevin P. Murphy,et al.  An experimental investigation of model-based parameter optimisation: SPO and beyond , 2009, GECCO.

[4]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[5]  D. Ginsbourger,et al.  A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.

[6]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[7]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[8]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[9]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[12]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[13]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[14]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[15]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[16]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[17]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[18]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[19]  Andy J. Keane,et al.  Recent advances in surrogate-based optimization , 2009 .

[20]  Antanas Zilinskas,et al.  Stochastic Global Optimization: A Review on the Occasion of 25 Years of Informatica , 2016, Informatica.