CORRECTING BOUNDARY OVER-EXPLORATION DEFICIENCIES IN BAYESIAN OPTIMIZATION WITH VIRTUAL DERIVATIVE SIGN OBSERVATIONS

Bayesian optimization (BO) is a global optimization strategy designed to find the minimum of an expensive black-box function, typically defined on a compact subset of ℛd, by using a Gaussian process (GP) as a surrogate model for the objective. Although currently available acquisition functions address this goal with different degree of success, an over-exploration effect of the contour of the search space is typically observed. However, in problems like the configuration of machine learning algorithms, the function domain is conservatively large and with a high probability the global minimum does not sit on the boundary of the domain. We propose a method to incorporate this knowledge into the search process by adding virtual derivative observations in the GP at the boundary of the search space. We use the properties of GPs to impose conditions on the partial derivatives of the objective. The method is applicable with any acquisition function, it is easy to use and consistently reduces the number of evaluations required to optimize the objective irrespective of the acquisition used. We illustrate the benefits of our approach in an extensive experimental comparison.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[3]  Neil D. Lawrence,et al.  GLASSES: Relieving The Myopia Of Bayesian Optimisation , 2015, AISTATS.

[4]  Aki Vehtari,et al.  Gaussian processes with monotonicity information , 2010, AISTATS.

[5]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[6]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[7]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[8]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10]  Aki Vehtari,et al.  Minimum energy path calculations with Gaussian process regression , 2016, 1703.10423.

[11]  Aki Vehtari,et al.  Laplace approximation for logistic Gaussian process density estimation and regression , 2012, 1211.0174.

[12]  Nando de Freitas,et al.  Unbounded Bayesian Optimization via Regularization , 2015, AISTATS.

[13]  Aki Vehtari,et al.  Automatic monotonicity detection for Gaussian Processes , 2016, 1610.05440.

[14]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[15]  Ian Dewancker,et al.  A Stratified Analysis of Bayesian Optimization Methods , 2016, ArXiv.

[16]  Matthias Poloczek,et al.  Bayesian Optimization with Gradients , 2017, NIPS.

[17]  Aki Vehtari,et al.  Nudged elastic band calculations accelerated with Gaussian process regression. , 2017, The Journal of chemical physics.

[18]  V'ictor Pena,et al.  Bayesian Optimization with Shape Constraints , 2016, 1612.08915.

[19]  James O. Berger,et al.  Estimating Shape Constrained Functions Using Gaussian Processes , 2016, SIAM/ASA J. Uncertain. Quantification.

[20]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[21]  Jeremy E. Oakley,et al.  Nonparametric elicitation for heavy-tailed prior distributions , 2007 .

[22]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .