Searching in the Forest for Local Bayesian Optimization

Because of its sample efficiency, Bayesian optimization (BO) has become a popular approach dealing with expensive black-box optimization problems, such as hyperparameter optimization (HPO). Recent empirical experiments showed that the loss landscapes of HPO problems tend to be more benign than previously assumed, i.e. in the best case unimodal and convex, such that a BO framework could be more efficient if it can focus on those promising local regions. In this paper, we propose BOinG, a two-stage approach that is tailored toward mid-sized configuration spaces, as one encounters in many HPO problems. In the first stage, we build a scalable global surrogate model with a random forest to describe the overall landscape structure. Further, we choose a promising subregion via a bottomup approach on the upper-level tree structure. In the second stage, a local model in this subregion is utilized to suggest the point to be evaluated next. Empirical experiments show that BOinG is able to exploit the structure of typical HPO problems and performs particularly well on mid-sized problems from synthetic functions and HPO.

[1]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[2]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[3]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[4]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[5]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[8]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[9]  Holger H. Hoos,et al.  Algorithm Configuration Landscapes: - More Benign Than Expected? , 2018, PPSN.

[10]  Aaron Klein,et al.  HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO , 2021, NeurIPS Datasets and Benchmarks.

[11]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[12]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[13]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[14]  Stephen J. Roberts,et al.  Optimization, fast and slow: optimally switching between local and Bayesian optimization , 2018, ICML.

[15]  Gisele L. Pappa,et al.  Fitness Landscape Analysis of Automated Machine Learning Search Spaces , 2020, EvoCOP.

[16]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[17]  F. Hutter,et al.  Fast Bayesian hyperparameter optimization on large datasets , 2017, Electronic Journal of Statistics.

[18]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[19]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[20]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[21]  M. Yuan,et al.  Doubly penalized likelihood estimator in heteroscedastic regression , 2004 .

[22]  Yuandong Tian,et al.  Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search , 2020, NeurIPS.

[23]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[24]  Nando de Freitas,et al.  Heteroscedastic Treed Bayesian Optimisation , 2014, ArXiv.

[25]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[26]  Michael A. Osborne,et al.  Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces , 2021, ICML.

[27]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[28]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[29]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[30]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[31]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[32]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[33]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[34]  Kirthevasan Kandasamy,et al.  Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[35]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[36]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[37]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[38]  Jacob R. Gardner,et al.  Parametric Gaussian Process Regressors , 2020, ICML.

[39]  Marius Lindauer,et al.  SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization , 2021, ArXiv.

[40]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..