Local Bayesian optimization via maximizing probability of descent

Local optimization presents a promising approach to expensive, high-dimensional black-box optimization by sidestepping the need to globally explore the search space. For objective functions whose gradient cannot be evaluated directly, Bayesian optimization offers one solution -- we construct a probabilistic model of the objective, design a policy to learn about the gradient at the current location, and use the resulting information to navigate the objective landscape. Previous work has realized this scheme by minimizing the variance in the estimate of the gradient, then moving in the direction of the expected gradient. In this paper, we re-examine and refine this approach. We demonstrate that, surprisingly, the expected value of the gradient is not always the direction maximizing the probability of descent, and in fact, these directions may be nearly orthogonal. This observation then inspires an elegant optimization scheme seeking to maximize the probability of descent while moving in the direction of most-probable descent. Experiments on both synthetic and real-world objectives show that our method outperforms previous realizations of this optimization scheme and is competitive against other, significantly more complicated baselines.

[1]  Michael A. Osborne,et al.  Probabilistic Numerics , 2022 .

[2]  Wesley J. Maddox,et al.  Optimizing High-Dimensional Physics Simulations via Composite Bayesian Optimization , 2021, ArXiv.

[3]  Sebastian Trimpe,et al.  Local policy search with Bayesian optimization , 2021, NeurIPS.

[4]  Michael A. Osborne,et al.  Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces , 2021, ICML.

[5]  Alexandre Scotto Di Perrotolo,et al.  TREGO: a trust-region framework for efficient global optimization , 2021, Journal of Global Optimization.

[6]  Yuandong Tian,et al.  Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search , 2020, NeurIPS.

[7]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[8]  David Evans,et al.  Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries , 2019, USENIX Security Symposium.

[9]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[10]  Stephen J. Roberts,et al.  Optimization, fast and slow: optimally switching between local and Bayesian optimization , 2018, ICML.

[11]  Victor Elvira,et al.  The Incremental Proximal Method: A Probabilistic Perspective , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Jan Peters,et al.  Local Bayesian Optimization of Motor Skills , 2017, ICML.

[13]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[14]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2016, Annals of Mathematics and Artificial Intelligence.

[15]  Philipp Hennig,et al.  Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.

[16]  Shie Mannor,et al.  Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.

[17]  Philipp Hennig,et al.  Fast Probabilistic Optimization from Noisy Gradients , 2013, ICML.

[18]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[20]  Alexander S. Szalay,et al.  Cosmological constraints from the clustering of the Sloan Digital Sky Survey DR7 luminous red galaxies (vol 404, pg 60, 2010) , 2009, 0907.1659.

[21]  Nicolas Le Roux,et al.  Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[22]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[23]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[24]  Michael A. Osborne,et al.  Adversarial Attacks on Graph Classifiers via Bayesian Optimisation , 2021, NeurIPS.

[25]  Michael Y. Li Explainability Constraints for Bayesian Optimization , 2019 .

[26]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[27]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.