Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a ``one-shot'' fashion. Combining this with an efficient method for implementing multi-step Gaussian process ``fantasization,'' we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks.

[1]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[2]  José Miguel Hernández-Lobato,et al.  Constrained Bayesian optimization for automatic chemical design using variational autoencoders. , 2019 .

[3]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[4]  Jian Wu,et al.  Practical Two-Step Lookahead Bayesian Optimization , 2019, NeurIPS.

[5]  Roman Garnett,et al.  BINOCULARS for efficient, nonmyopic sequential experimental design , 2019, ICML.

[6]  Frank Hutter,et al.  Maximizing acquisition functions for Bayesian optimization , 2018, NeurIPS.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Karen Willcox,et al.  Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach , 2016, NIPS.

[9]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[10]  D. Ginsbourger,et al.  Towards Gaussian Process-based Optimization with Finite Time Horizon , 2010 .

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[13]  Peter I. Frazier,et al.  The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.

[14]  Roman Garnett,et al.  Efficient nonmyopic batch active search , 2018, NeurIPS.

[15]  Roman Garnett,et al.  Efficient Nonmyopic Active Search , 2017, ICML.

[16]  Wei Chen,et al.  Bayesian Optimization for Materials Design with Mixed Quantitative and Qualitative Variables , 2019, Scientific Reports.

[17]  Neil D. Lawrence,et al.  GLASSES: Relieving The Myopia Of Bayesian Optimisation , 2015, AISTATS.

[18]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[19]  Andrew Gordon Wilson,et al.  Constant-Time Predictive Distributions for Gaussian Processes , 2018, ICML.

[20]  Xubo Yue,et al.  Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout , 2019, AISTATS.

[21]  Roman Garnett,et al.  Automating Bayesian optimization with Bayesian optimization , 2018, NeurIPS.

[22]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[23]  Andrew Gordon Wilson,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[24]  Kevin Leyton-Brown,et al.  Efficient benchmarking of algorithm configurators via model-based surrogates , 2017, Machine Learning.

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[26]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[27]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[28]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[29]  Andrew Gordon Wilson,et al.  BoTorch: Programmable Bayesian Optimization in PyTorch , 2019, ArXiv.

[30]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[31]  Peter I. Frazier,et al.  Parallel Bayesian Global Optimization of Expensive Functions , 2016, Oper. Res..

[32]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[33]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[34]  Roman Garnett,et al.  D-VAE: A Variational Autoencoder for Directed Acyclic Graphs , 2019, NeurIPS.

[35]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .