论文信息 - Parallel Bayesian Global Optimization of Expensive Functions - 字舞流文

Parallel Bayesian Global Optimization of Expensive Functions

We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). At the heart of this algorithm is maximizing the information criterion called the "multi-points expected improvement'', or the q-EI. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI. A high-quality open source implementation of this algorithm is available in the open source Metrics Optimization Engine (MOE).

Peter I. Frazier | Jialei Wang | Eric Liu | Scott C. Clark | P. Frazier | Jialei Wang | Scott C. Clark | Eric Liu

[1] D. L. Hanson,et al. Nonparametric Upper Confidence Bounds for Pr{Y < X} and Confidence Limits for Pr{Y < X} When X and Y are Normal , 1964 .

[2] Harold J. Kushner,et al. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[3] Ronald A. Howard,et al. Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[4] Alexander H. G. Rinnooy Kan,et al. Bayesian stopping rules for multistart global optimization methods , 1987, Math. Program..

[5] Yu-Chi Ho,et al. Performance evaluation and perturbation analysis of discrete event dynamic systems , 1987 .

[6] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[7] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8] Jonas Mockus,et al. Bayesian Approach to Global Optimization , 1989 .

[9] J. Mockus. The Bayesian Approach to Local Optimization , 1989 .

[10] J. Mockus. Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[11] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .

[12] John E. Dennis,et al. Direct Search Methods on Parallel Machines , 1991, SIAM J. Optim..

[13] J. Dennis,et al. Direct Search Methods on Parallel Machines , 1991 .

[14] A. Genz. Numerical Computation of Multivariate Normal Probabilities , 1992 .

[15] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[16] R. Ababou,et al. On the condition number of covariance matrices in kriging, estimation, and simulation of random fields , 1994 .

[17] P. L’Ecuyer,et al. On the interchange of derivative and expectation for likelihood ratio derivative estimators , 1995 .

[18] S. P. Smith. Differentiation of the Cholesky Algorithm , 1995 .

[19] J. Calvin. Average performance of a class of adaptive algorithms for global optimization , 1997 .

[20] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[21] William J. Welch,et al. Computer experiments and global optimization , 1997 .

[22] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[23] Richard J. Beckman,et al. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[24] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .

[25] James M. Calvin,et al. One-dimensional Global Optimization Based on Statistical Models , 2002 .

[26] Thomas J. Santner,et al. Design and analysis of computer experiments , 1998 .

[27] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[28] Philippe Flajolet,et al. Adaptive Sampling , 1997 .

[29] A. ilinskas,et al. One-Dimensional global optimization for observations with noise , 2005 .

[30] N. Zheng,et al. Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[31] Barry L. Nelson,et al. Recent advances in ranking and selection , 2007, 2007 Winter Simulation Conference.

[32] David Ginsbourger,et al. A Multi-points Criterion for Deterministic Parallel Global Optimization based on Kriging , 2007 .

[33] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.

[34] Eric Walter,et al. Global optimization based on noisy evaluations: An empirical study of two statistical approaches , 2008 .

[35] Eric Walter,et al. An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[36] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[37] Kellen Petersen August. Real Analysis , 2009 .

[38] Warren B. Powell,et al. The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[39] Nando de Freitas,et al. A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[40] E. Vázquez,et al. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[41] D. Ginsbourger,et al. Kriging is well-suited to parallelize optimization , 2010 .

[42] Peter I. Frazier,et al. Value of information methods for pairwise sampling with correlations , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[43] David Ginsbourger,et al. Expected Improvements for the Asynchronous Parallel Global Optimization of Expensive Functions: Potentials and Challenges , 2012, LION.

[44] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[45] David Ginsbourger,et al. Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[46] Xin-She Yang,et al. A literature survey of benchmark functions for global optimisation problems , 2013, Int. J. Math. Model. Numer. Optimisation.

[47] Peter I. Frazier,et al. Bayesian optimization for materials design , 2015, 1506.01349.

[48] David Ginsbourger,et al. Differentiating the Multipoint Expected Improvement for Optimal Batch Design , 2015, MOD.

[49] Peter I. Frazier,et al. Bayesian Optimization via Simulation with Pairwise Sampling and Correlated Prior Beliefs , 2016, Oper. Res..