Meta-Surrogate Benchmarking for Hyperparameter Optimization

Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practitioners not only from systematically running large-scale comparisons that are needed to draw statistically significant results but also from reproducing experiments that were conducted before. This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.

[1]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[2]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[3]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[4]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[5]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[6]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[7]  Christian Daniel,et al.  Meta-Learning Acquisition Functions for Bayesian Optimization , 2019, ArXiv.

[8]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[9]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[10]  Melih Elibol,et al.  Probabilistic Matrix Factorization for Automated Machine Learning , 2017, NeurIPS.

[11]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[12]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[13]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[14]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[15]  Aaron Klein,et al.  Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization , 2019, ArXiv.

[16]  Xu Ye,et al.  Advances in estimation of distribution algorithms , 2012 .

[17]  Ameet Talwalkar,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[18]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[19]  Neil D. Lawrence,et al.  Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes , 2017, NIPS.

[20]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[21]  Stefan M. Wild,et al.  Benchmarking Derivative-Free Optimization Algorithms , 2009, SIAM J. Optim..

[22]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[23]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[24]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[25]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[26]  Christian Daniel,et al.  Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization , 2020, ICLR.

[27]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[28]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[29]  F. Hutter,et al.  Fast Bayesian hyperparameter optimization on large datasets , 2017, Electronic Journal of Statistics.

[30]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[31]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[32]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[33]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[34]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[35]  Anne Auger,et al.  COCO: Performance Assessment , 2016, ArXiv.

[36]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[37]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[38]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[39]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[40]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[41]  Aaron Klein,et al.  RoBO : A Flexible and Robust Bayesian Optimization Framework in Python , 2017 .

[42]  Nando de Freitas,et al.  Bayesian Optimization in AlphaGo , 2018, ArXiv.

[43]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[44]  Matthias W. Seeger,et al.  Scalable Hyperparameter Transfer Learning , 2018, NeurIPS.

[45]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.