The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.

[1]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[2]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[3]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  S. P. Smith Differentiation of the Cholesky Algorithm , 1995 .

[6]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[7]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[8]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[9]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[10]  Dick Wick Hall,et al.  Elementary Real Analysis , 1971 .

[11]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[12]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[13]  Warren B. Powell,et al.  Nested-Batch-Mode Learning and Stochastic Optimization with An Application to Sequential MultiStage Testing in Materials Science , 2015, SIAM J. Sci. Comput..

[14]  P. L’Ecuyer,et al.  A Unified View of the IPA, SF, and LR Gradient Estimation Techniques , 1990 .

[15]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[16]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[17]  Zoubin Ghahramani,et al.  Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions , 2015, NIPS.

[18]  R JonesDonald,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998 .

[19]  Warren B. Powell,et al.  The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters using Gaussian Process Regression , 2011, SIAM J. Optim..

[20]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[21]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[22]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[23]  David Ginsbourger,et al.  Differentiating the Multipoint Expected Improvement for Optimal Batch Design , 2015, MOD.