Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors

We deal with the efficient parallelization of Bayesian global optimization algorithms, and more specifically of those based on the expected improvement criterion and its variants. A closed form formula relying on multivariate Gaussian cumulative distribution functions is established for a generalized version of the multipoint expected improvement criterion. In turn, the latter relies on intermediate results that could be of independent interest concerning moments of truncated Gaussian vectors. The obtained expansion of the criterion enables studying its differentiability with respect to point batches and calculating the corresponding gradient in closed form. Furthermore , we derive fast numerical approximations of this gradient and propose efficient batch optimization strategies. Numerical experiments illustrate that the proposed approaches enable computational savings of between one and two order of magnitudes, hence enabling derivative-based batch-sequential acquisition function maximization to become a practically implementable and efficient standard.

[1]  Kenny Q. Ye,et al.  Algorithmic construction of optimal symmetric Latin hypercube designs , 2000 .

[2]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[3]  B. A. Worley Deterministic uncertainty analysis , 1987 .

[4]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[5]  David Ginsbourger,et al.  Expected Improvements for the Asynchronous Parallel Global Optimization of Expensive Functions: Potentials and Challenges , 2012, LION.

[6]  Thomas Bartz-Beielstein,et al.  Sequential parameter optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Raphael T. Haftka,et al.  Assessing the Value of Another Cycle in Surrogate-based Optimization , 2006 .

[8]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[9]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[10]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[11]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[12]  A. G. Zhilinskas,et al.  Single-step Bayesian search method for an extremum of functions of a single variable , 1975 .

[13]  S. Berman An extension of Plackett's differential equation for the multivariate normal density , 1987 .

[14]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[15]  Victor Picheny,et al.  Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision , 2013, Technometrics.

[16]  Alan Fern,et al.  Batch Bayesian Optimization via Simulation Matching , 2010, NIPS.

[17]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  David Ginsbourger,et al.  Parallel Budgeted Optimization Applied to the Design of an Air Duct , 2012 .

[20]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[21]  T. Hothorn,et al.  Multivariate Normal and t Distributions , 2016 .

[22]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[23]  D. Lizotte Practical bayesian optimization , 2008 .

[24]  N. Cressie,et al.  The Moment-Generating Function and Negative Integer Moments , 1981 .

[25]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[26]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[27]  G. M. Tallis The Moment Generating Function of the Truncated Multi‐Normal Distribution , 1961 .

[28]  Robert B. Gramacy,et al.  Gaussian Process Single-Index Models as Emulators for Computer Experiments , 2010, Technometrics.

[29]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[30]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[32]  Andreas Krause,et al.  Navigating the protein fitness landscape with Gaussian processes , 2012, Proceedings of the National Academy of Sciences.

[33]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[34]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  W. V. Harper,et al.  Sensitivity/uncertainty analysis of a borehole scenario comparing Latin Hypercube Sampling and deterministic sensitivity approaches , 1983 .

[36]  Neil D. Lawrence,et al.  Batch Bayesian Optimization via Local Penalization , 2015, AISTATS.

[37]  Warren B. Powell,et al.  A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..

[38]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[39]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[40]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[41]  William J. Welch,et al.  Computer experiments and global optimization , 1997 .

[42]  A. O'Hagan,et al.  Bayesian inference for the uncertainty distribution of computer model outputs , 2002 .

[43]  Vianney Perchet,et al.  Gaussian Process Optimization with Mutual Information , 2013, ICML.

[44]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .