Parallel Bayesian Global Optimization of Expensive Functions

We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement'', or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics Optimization Engine (MOE).

[1]  D. L. Hanson,et al.  Nonparametric Upper Confidence Bounds for Pr{Y < X} and Confidence Limits for Pr{Y < X} When X and Y are Normal , 1964 .

[2]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[3]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[4]  Alexander H. G. Rinnooy Kan,et al.  Bayesian stopping rules for multistart global optimization methods , 1987, Math. Program..

[5]  Yu-Chi Ho,et al.  Performance evaluation and perturbation analysis of discrete event dynamic systems , 1987 .

[6]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[7]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8]  Jonas Mockus,et al.  Bayesian Approach to Global Optimization , 1989 .

[9]  J. Mockus The Bayesian Approach to Local Optimization , 1989 .

[10]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[11]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[12]  John E. Dennis,et al.  Direct Search Methods on Parallel Machines , 1991, SIAM J. Optim..

[13]  J. Dennis,et al.  Direct Search Methods on Parallel Machines , 1991 .

[14]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[15]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[16]  R. Ababou,et al.  On the condition number of covariance matrices in kriging, estimation, and simulation of random fields , 1994 .

[17]  P. L’Ecuyer,et al.  On the interchange of derivative and expectation for likelihood ratio derivative estimators , 1995 .

[18]  S. P. Smith Differentiation of the Cholesky Algorithm , 1995 .

[19]  J. Calvin Average performance of a class of adaptive algorithms for global optimization , 1997 .

[20]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[21]  William J. Welch,et al.  Computer experiments and global optimization , 1997 .

[22]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[23]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[24]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[25]  James M. Calvin,et al.  One-dimensional Global Optimization Based on Statistical Models , 2002 .

[26]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[27]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[28]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[29]  A. ilinskas,et al.  One-Dimensional global optimization for observations with noise , 2005 .

[30]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[31]  Barry L. Nelson,et al.  Recent advances in ranking and selection , 2007, 2007 Winter Simulation Conference.

[32]  David Ginsbourger,et al.  A Multi-points Criterion for Deterministic Parallel Global Optimization based on Kriging , 2007 .

[33]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[34]  Eric Walter,et al.  Global optimization based on noisy evaluations: An empirical study of two statistical approaches , 2008 .

[35]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[36]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[37]  Kellen Petersen August Real Analysis , 2009 .

[38]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[39]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[40]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[41]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[42]  Peter I. Frazier,et al.  Value of information methods for pairwise sampling with correlations , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[43]  David Ginsbourger,et al.  Expected Improvements for the Asynchronous Parallel Global Optimization of Expensive Functions: Potentials and Challenges , 2012, LION.

[44]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[45]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[46]  Xin-She Yang,et al.  A literature survey of benchmark functions for global optimisation problems , 2013, Int. J. Math. Model. Numer. Optimisation.

[47]  Peter I. Frazier,et al.  Bayesian optimization for materials design , 2015, 1506.01349.

[48]  David Ginsbourger,et al.  Differentiating the Multipoint Expected Improvement for Optimal Batch Design , 2015, MOD.

[49]  Peter I. Frazier,et al.  Bayesian Optimization via Simulation with Pairwise Sampling and Correlated Prior Beliefs , 2016, Oper. Res..