Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020

This paper presents the results and insights from the black-box optimization (BBO) challenge at NeurIPS 2020 which ran from July–October, 2020. The challenge emphasized the importance of evaluating derivative-free optimizers for tuning the hyperparameters of machine learning models. This was the first black-box optimization challenge with a machine learning emphasis. It was based on tuning (validation set) performance of standard machine learning models on real datasets. This competition has widespread impact as black-box optimization (e.g., Bayesian optimization) is relevant for hyperparameter tuning in almost every machine learning project as well as many applications outside of machine learning. The final leaderboard was determined using the optimization performance on held-out (hidden) objective functions, where the optimizers ran without human intervention. Baselines were set using the default settings of several open-source black-box optimization packages as well as random search.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[3]  B. Karrer,et al.  AE: A domain-agnostic platform for adaptive experimentation , 2018 .

[4]  Aaron Klein,et al.  RoBO : A Flexible and Robust Bayesian Optimization Framework in Python , 2017 .

[5]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[6]  Tom Dhaene,et al.  GPflowOpt: A Bayesian Optimization Library using TensorFlow , 2017, NIPS 2017.

[7]  Francesco Archetti,et al.  Bayesian optimization of pump operations in water distribution systems , 2018, J. Glob. Optim..

[8]  Mark Pullin,et al.  Emulation of physical processes with Emukit , 2021, ArXiv.

[9]  Francisco Herrera,et al.  An Insight into Bio-inspired and Evolutionary Algorithms for Global Optimization: Review, Analysis, and Lessons Learnt over a Decade of Competitions , 2018, Cognitive Computation.

[10]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[11]  Christine A. Shoemaker,et al.  A Stochastic Radial Basis Function Method for the Global Optimization of Expensive Functions , 2007, INFORMS J. Comput..

[12]  Haitham Bou-Ammar,et al.  HEBO: Heteroscedastic Evolutionary Bayesian Optimisation , 2020, ArXiv.

[13]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[14]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[15]  Yuandong Tian,et al.  Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search , 2020, NeurIPS.

[16]  Peter I. Frazier,et al.  The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.

[17]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[18]  Gilberto Titericz,et al.  GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms , 2020, ArXiv.

[19]  Taehyeon Kim,et al.  Adaptive Local Bayesian Optimization Over Multiple Discrete Variables , 2020, ArXiv.

[20]  Keith L. Downing,et al.  Introduction to Evolutionary Algorithms , 2006 .

[21]  Carl E. Rasmussen,et al.  Model based learning of sigma points in unscented Kalman filtering , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[22]  Koji Tsuda,et al.  COMBO: An efficient Bayesian optimization library for materials science , 2016 .

[23]  Gavin C. Cawley,et al.  Hands-On Pattern Recognition: Challenges in Machine Learning, volume 1 , 2011 .

[24]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[25]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[26]  Matthias Poloczek,et al.  Multi-Information Source Optimization , 2016, NIPS.

[27]  Christine A. Shoemaker,et al.  pySOT and POAP: An event-driven asynchronous framework for surrogate optimization , 2019, ArXiv.

[28]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[29]  Xavier Bouthillier,et al.  Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020 , 2020 .

[30]  Neil D. Lawrence,et al.  Batch Bayesian Optimization via Local Penalization , 2015, AISTATS.

[31]  Gregory E. Fasshauer,et al.  Kernel-based Approximation Methods using MATLAB , 2015, Interdisciplinary Mathematical Sciences.

[32]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[33]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[34]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[35]  Junichiro Shiomi,et al.  Designing Nanostructures for Phonon Transport via Bayesian Optimization , 2016, 1609.04972.

[36]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[37]  Matthias Poloczek,et al.  Scalable Constrained Bayesian Optimization , 2020, AISTATS.

[38]  Joshua D. Knowles,et al.  ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems , 2006, IEEE Transactions on Evolutionary Computation.

[39]  A. Shpilman,et al.  Solving Black-Box Optimization Challenge via Learning Search Space Partition for Local Bayesian Optimization , 2020, NeurIPS.

[40]  Kirthevasan Kandasamy,et al.  ProBO: a Framework for Using Probabilistic Programming in Bayesian Optimization , 2019, ArXiv.

[41]  Maximilian Balandat,et al.  Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization , 2020, NeurIPS.

[42]  Naren Ramakrishnan,et al.  Better call Surrogates: A hybrid Evolutionary Algorithm for Hyperparameter optimization , 2020, ArXiv.

[43]  R JonesDonald,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998 .

[44]  M. Urner Scattered Data Approximation , 2016 .

[45]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[46]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[47]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[48]  Warren B. Powell,et al.  The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery , 2011, INFORMS J. Comput..

[49]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[50]  Sergio Escalera,et al.  Towards Automated Deep Learning: Analysis of the AutoDL challenge series 2019 , 2019, Proceedings of Machine Learning Research.

[51]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[52]  Sébastien Le Digabel,et al.  Modeling an Augmented Lagrangian for Blackbox Constrained Optimization , 2014, Technometrics.

[53]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[54]  Paul W. Leu,et al.  Creating glasswing butterfly-inspired durable antifogging superomniphobic supertransmissive, superclear nanostructured glass through Bayesian learning and optimization , 2019, Materials Horizons.

[55]  Peter I. Frazier,et al.  Bayesian optimization for materials design , 2015, 1506.01349.

[56]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[57]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[58]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[59]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[60]  Alberto Costa,et al.  RBFOpt: an open-source library for black-box optimization with costly function evaluations , 2018, Mathematical Programming Computation.

[61]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .