HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO

To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years, the number of efficient algorithms and tools for HPO grew substantially. At the same time, the community is still lacking realistic, diverse, computationally cheap, and standardized benchmarks. This is especially the case for multi-fidelity HPO methods. To close this gap, we propose HPOBench, which includes 7 existing and 5 new benchmark families, with in total more than 100 multi-fidelity benchmark problems. HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers. It also provides surrogate and tabular benchmarks for computationally affordable yet statistically sound evaluations. To demonstrate the broad compatibility of HPOBench and its usefulness, we conduct an exemplary large-scale study evaluating 6 well known multi-fidelity HPO tools.

[1]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[2]  Thomas Stützle,et al.  AClib: A Benchmark Library for Algorithm Configuration , 2014, LION.

[3]  Raymond Chiong,et al.  Evolutionary Optimization: Pitfalls and Booby Traps , 2012, Journal of Computer Science and Technology.

[4]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[5]  Michael A. Osborne,et al.  A Kernel for Hierarchical Parameter Spaces , 2013, ArXiv.

[6]  Lorena A. Barba,et al.  Terminologies for Reproducible Research , 2018, ArXiv.

[7]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[8]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[9]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[10]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[11]  Bernd Bischl,et al.  Automatic Exploration of Machine Learning Experiments on OpenML , 2018, ArXiv.

[12]  Tim Head,et al.  Reproducible Research Environments with Repo2Docker , 2018 .

[13]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[14]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[15]  Editors , 2003 .

[16]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[17]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[18]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[19]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[20]  B. Bischl,et al.  Collecting Empirical Data About Hyperparameters for Data Driven AutoML , 2020 .

[21]  Aaron Klein,et al.  Model-based Asynchronous Hyperparameter and Neural Architecture Search , 2020 .

[22]  Hans-Peter Kriegel,et al.  The (black) art of runtime evaluation: Are we comparing algorithms or implementations? , 2017, Knowledge and Information Systems.

[23]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[24]  Hao Jianye,et al.  An Empirical Study of Assumptions in Bayesian Optimisation , 2021 .

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[27]  Matthias Poloczek,et al.  Bayesian Optimization with Gradients , 2017, NIPS.

[28]  Ricardo B. C. Prudencio,et al.  Data vs classifiers, who wins? , 2021, ArXiv.

[29]  Hao Wang,et al.  IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics , 2018, ArXiv.

[30]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[31]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[34]  Willie Neiswanger,et al.  Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation , 2021, ArXiv.

[35]  Josif Grabocka,et al.  HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML , 2021, NeurIPS Datasets and Benchmarks.

[36]  Noor H. Awad,et al.  Differential Evolution for Neural Architecture Search , 2020, ArXiv.

[37]  Colin Raffel,et al.  Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.

[38]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[39]  Aaron Klein,et al.  Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization , 2019, ArXiv.

[40]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[41]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[42]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[43]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[44]  Andreas Krause,et al.  Mixed-Variable Bayesian Optimization , 2020, IJCAI.

[45]  Bernd Bischl,et al.  YAHPO Gym - Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization , 2021, ArXiv.

[46]  Tal Arbel,et al.  Accounting for Variance in Machine Learning Benchmarks , 2021, MLSys.

[47]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[48]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[49]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[50]  Ameet Talwalkar,et al.  A System for Massively Parallel Hyperparameter Tuning , 2020, MLSys.

[51]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[52]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[53]  Frank Hutter,et al.  NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search , 2020, ICLR.

[54]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[55]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[58]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[59]  Johann Petrak,et al.  Fast Subsampling Performance Estimates for Classification Algorithm Selection , 2000 .

[60]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[61]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[62]  Sicco Verwer,et al.  EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions , 2021, ArXiv.

[63]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[64]  Matteo Aldeghi,et al.  Olympus: a benchmarking framework for noisy optimization and experiment planning , 2020, Mach. Learn. Sci. Technol..

[65]  Marius Lindauer,et al.  SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization , 2021, ArXiv.

[66]  Lars Hertel,et al.  Sherpa: Robust Hyperparameter Optimization for Machine Learning , 2020, SoftwareX.

[67]  Luigi Nardi,et al.  LassoBench: A High-Dimensional Hyperparameter Optimization Benchmark Suite for Lasso , 2021, ArXiv.

[68]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[69]  Francisco Herrera,et al.  An Insight into Bio-inspired and Evolutionary Algorithms for Global Optimization: Review, Analysis, and Lessons Learnt over a Decade of Competitions , 2018, Cognitive Computation.

[70]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[71]  J. Vanschoren Meta-Learning , 2018, Automated Machine Learning.

[72]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[73]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[74]  Pascal Vincent,et al.  Unreproducible Research is Reproducible , 2019, ICML.

[75]  Ichiro Takeuchi,et al.  Multi-fidelity Bayesian Optimization with Max-value Entropy Search , 2019, ICML.

[76]  Neil D. Lawrence,et al.  Meta-Surrogate Benchmarking for Hyperparameter Optimization , 2019, NeurIPS.

[77]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[78]  Marius Lindauer,et al.  Pitfalls and Best Practices in Algorithm Configuration , 2017, J. Artif. Intell. Res..

[79]  E. Dahlman,et al.  A Critical Assessment of Benchmark Comparison in Planning , 2002, J. Artif. Intell. Res..

[80]  Yisong Yue,et al.  A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes , 2018, AISTATS.

[81]  Andrew Gordon Wilson,et al.  Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning , 2019, UAI.

[82]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[83]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[84]  Susan P. Holmes,et al.  Reproducible Research Workflow in R for the Analysis of Personalized Human Microbiome Data , 2016, PSB.

[85]  Margret Keuper,et al.  NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search , 2020, ArXiv.

[86]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[87]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[88]  Ruben Martinez-Cantin,et al.  Funneled Bayesian Optimization for Design, Tuning and Control of Autonomous Systems , 2016, IEEE Transactions on Cybernetics.

[89]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[90]  Kevin Leyton-Brown,et al.  Efficient benchmarking of algorithm configurators via model-based surrogates , 2017, Machine Learning.

[91]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[92]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[93]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[94]  Sergio Escalera,et al.  Winning Solutions and Post-Challenge Analyses of the ChaLearn AutoDL Challenge 2019 , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[96]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[97]  Thomas Bartz-Beielstein,et al.  Benchmarking in Optimization: Best Practice and Open Issues , 2020, ArXiv.

[98]  Isabelle Guyon,et al.  Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020 , 2021, NeurIPS.

[99]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[100]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[101]  Marius Lindauer,et al.  Towards Assessing the Impact of Bayesian Optimization's Own Hyperparameters , 2019, ArXiv.

[102]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[103]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[104]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[105]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[106]  Jascha Sohl-Dickstein,et al.  Using a thousand optimization tasks to learn hyperparameter search strategies , 2020, ArXiv.

[107]  J. N. Rijn,et al.  OpenML Benchmarking Suites , 2017, NeurIPS Datasets and Benchmarks.

[108]  Matthias W. Seeger,et al.  Scalable Hyperparameter Transfer Learning , 2018, NeurIPS.

[109]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[110]  Toby Walsh,et al.  How Not To Do It , 1995 .

[111]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[112]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.