Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down

Bayesian Optimization (BO) has become a core method for solving expensive black-box optimization problems. While much research focussed on the choice of the acquisition function, we focus on online length-scale adaption and the choice of kernel function. Instead of choosing hyperparameters in view of maximum likelihood on past data, we propose to use the acquisition function to decide on hyperparameter adaptation more robustly and in view of the future optimization progress. Further, we propose a particular kernel function that includes non-stationarity and local anisotropy and thereby implicitly integrates the efficiency of local convex optimization with global Bayesian optimization. Comparisons to state-of-the art BO methods underline the efficiency of these mechanisms on global optimization benchmarks.

[1]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[2]  Ruben Martinez-Cantin Locally-Biased Bayesian Optimization using Nonstationary Gaussian Processes , 2015 .

[3]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[4]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[5]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[6]  Nando de Freitas,et al.  Heteroscedastic Treed Bayesian Optimisation , 2014, ArXiv.

[7]  Donald R. Jones,et al.  Global optimization of deceptive functions with sparse sampling , 2008 .

[8]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[9]  Nils-Hassan Quttineh,et al.  Implementation of a One-Stage Efficient Global Optimization (EGO) Algorithm , 2009 .

[10]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[11]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[12]  Rodolphe Le Riche,et al.  Small ensembles of kriging models for optimization , 2016, ArXiv.

[13]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[14]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..