Bayesian Optimization with a Prior for the Optimum

While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Bayesian Optimization with a Prior for the Optimum (BOPrO). BOPrO allows users to inject their knowledge into the optimization process in the form of priors about which parts of the input space will yield the best performance, rather than BO’s standard priors over functions, which are much less intuitive for users. BOPrO then combines these priors with BO’s standard probabilistic model to form a pseudo-posterior used to select which points to evaluate next. We show that BOPrO is around 6.67× faster than state-of-the-art methods on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application. We also show that BOPrO converges faster even if the priors for the optimum are not entirely accurate and that it robustly recovers from misleading priors.

[1]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[5]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[6]  Kalyan Veeramachaneni,et al.  ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning , 2019, CHI.

[7]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[8]  Neil D. Lawrence,et al.  Meta-Surrogate Benchmarking for Hyperparameter Optimization , 2019, NeurIPS.

[9]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[10]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[11]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[12]  J. Vanschoren Meta-Learning , 2018, Automated Machine Learning.

[13]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[16]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[17]  Aki Vehtari,et al.  CORRECTING BOUNDARY OVER-EXPLORATION DEFICIENCIES IN BAYESIAN OPTIMIZATION WITH VIRTUAL DERIVATIVE SIGN OBSERVATIONS , 2017, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[18]  Antonio Robles-Kelly,et al.  Incorporating Expert Prior Knowledge into Experimental Design via Posterior Sampling , 2020, ArXiv.

[19]  Mark Pullin,et al.  Emulation of physical processes with Emukit , 2021, ArXiv.

[20]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[21]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[22]  Matthias Seeger,et al.  Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning , 2019, NeurIPS.

[23]  Xavier Bouthillier,et al.  Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020 , 2020 .

[24]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[25]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[26]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[27]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[28]  Paul H. J. Kelly,et al.  Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[29]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[30]  Kunle Olukotun,et al.  Spatial: a language and compiler for application accelerators , 2018, PLDI.

[31]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[32]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[33]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[34]  Kunle Olukotun,et al.  Practical Design Space Exploration , 2018, 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[35]  Nando de Freitas,et al.  Unbounded Bayesian Optimization via Regularization , 2015, AISTATS.

[36]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[37]  Cheng Li,et al.  Incorporating Expert Prior in Bayesian Optimisation via Space Warping , 2020, Knowl. Based Syst..