Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods – e.g., based on resampling error estimation for supervised machine learning – can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This work is accompanied by an appendix that contains information on specific software packages in R and Python, as well as information and recommended hyperparameter search spaces for specific learning algorithms. We also provide notebooks that demonstrate concepts from this work as supplementary files.

[1]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[4]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[5]  Nadeem Javaid,et al.  A survey on hyperparameters optimization algorithms of forecasting models in smart grid , 2020 .

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Bernd Bischl,et al.  compboost: Modular Framework for Component-Wise Boosting , 2018, J. Open Source Softw..

[8]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[9]  Rory Wilson,et al.  A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization , 2015, BMC Medical Research Methodology.

[10]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[11]  David P. Rodgers,et al.  Improvements in multiprocessor system design , 1985, ISCA '85.

[12]  Jakob Bossek,et al.  Initial design strategies and their effects on sequential model-based optimization: an exploratory case study based on BBOB , 2020, GECCO.

[13]  Jakob Bossek,et al.  ecr 2.0: a modular framework for evolutionary computation in R , 2017, GECCO.

[14]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[15]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[18]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[19]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[20]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[23]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[24]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[25]  I. A. Antonov,et al.  An economic method of computing LPτ-sequences , 1979 .

[26]  Marius Lindauer,et al.  Learning Heuristic Selection with Dynamic Algorithm Configuration , 2021, ICAPS.

[27]  Mengjie Zhang,et al.  A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification , 2015, Int. J. Comput. Intell. Appl..

[28]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[29]  Bernd Bischl,et al.  Multi-objective hyperparameter tuning and feature selection using filter ensembles , 2020, GECCO.

[30]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[31]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[32]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[33]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[34]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[35]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[36]  Bernd Bischl,et al.  Learning multiple defaults for machine learning algorithms , 2018, GECCO Companion.

[37]  Stefano Ermon,et al.  Sparse Gaussian Processes for Bayesian Optimization , 2016, UAI.

[38]  Eyke Hüllermeier,et al.  ML-Plan: Automated machine learning via hierarchical planning , 2018, Machine Learning.

[39]  Dhanya Pramod,et al.  Comparison of Performance of Data Imputation Methods for Numeric Dataset , 2019, Appl. Artif. Intell..

[40]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[41]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[42]  Paolo Frasconi,et al.  On Hyperparameter Optimization in Learning Systems , 2017, ICLR.

[43]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[44]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[45]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[46]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[47]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[50]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[51]  Hong Zhu,et al.  Hyper-Parameter Optimization: A Review of Algorithms and Applications , 2020, ArXiv.

[52]  Eduardo C. Garrido-Merchán,et al.  Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes , 2017, Neurocomputing.

[53]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[54]  Marius Lindauer,et al.  SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization , 2021, ArXiv.

[55]  Bernd Bischl,et al.  RAMBO: Resource-Aware Model-Based Optimization with Scheduling for Heterogeneous Runtimes and a Comparison with Asynchronous Model-Based Optimization , 2017, LION.

[56]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[57]  Nielen Stander,et al.  On the robustness of a simple domain reduction scheme for simulation‐based optimization , 2002 .

[58]  Jiawei Jiang,et al.  OpenBox: A Generalized Black-box Optimization Service , 2021, KDD.

[59]  D. Ginsbourger,et al.  A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.

[60]  Philipp Probst,et al.  To tune or not to tune the number of trees in random forest? , 2017, J. Mach. Learn. Res..

[61]  Yaohui Zeng,et al.  The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R , 2017, R J..

[62]  Bernd Bischl,et al.  Multi-objective parameter configuration of machine learning algorithms using model-based optimization , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[63]  B. Bischl,et al.  mlr3pipelines - Flexible Machine Learning Pipelines in R , 2021, J. Mach. Learn. Res..

[64]  Alexander J. Smola,et al.  Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation , 2020, NeurIPS.

[65]  J. Sekhon,et al.  Genetic Optimization Using Derivatives , 1998, Political Analysis.

[66]  Marius Lindauer,et al.  Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework , 2020, ECAI.

[67]  Bernd Bischl,et al.  Benchmark for filter methods for feature selection in high-dimensional classification data , 2020, Comput. Stat. Data Anal..

[68]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[69]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[70]  Bernd Bischl,et al.  Meta-learning for symbolic hyperparameter defaults , 2021, GECCO Companion.

[71]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[72]  B. Bischl,et al.  Collecting Empirical Data About Hyperparameters for Data Driven AutoML , 2020 .

[73]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[74]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[75]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[76]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[77]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[78]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[79]  B. Bischl,et al.  Quantifying Model Complexity via Functional Decomposition for Better Post-hoc Interpretability , 2019, PKDD/ECML Workshops.

[80]  Matthias Poloczek,et al.  A Framework for Bayesian Optimization in Embedded Subspaces , 2019, ICML.

[81]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[82]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[83]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[84]  Joseph Sexton,et al.  Standard errors for bagged and random forest estimators , 2009, Comput. Stat. Data Anal..

[85]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[86]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[87]  Robert E. Sweeney,et al.  A Transformation for Simplifying the Interpretation of Coefficients of Binary Variables in Regression Analysis , 1972 .

[88]  Victor Picheny,et al.  Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise , 2017, Eur. J. Oper. Res..

[89]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[90]  Marius Lindauer,et al.  Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL , 2020, ArXiv.

[91]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[92]  Ang Li,et al.  A Generalized Framework for Population Based Training , 2019, KDD.

[93]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[94]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[95]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[96]  Bernd Bischl,et al.  Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation , 2012, Evolutionary Computation.

[97]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[98]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[99]  Thomas Bartz-Beielstein,et al.  Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[100]  Bernd Bischl,et al.  Benchmarking Classification Algorithms on High-Performance Computing Clusters , 2012, GfKl.

[101]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[102]  M. Sasena,et al.  Exploration of Metamodeling Sampling Criteria for Constrained Global Optimization , 2002 .

[103]  Przemyslaw Biecek,et al.  Does imputation matter? Benchmark for predictive models , 2020, ArXiv.

[104]  Justin D. Weisz,et al.  Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems , 2020, IUI.

[105]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[106]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[107]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[108]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[109]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[110]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[111]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[112]  Sherif Sakr,et al.  Automated Machine Learning: State-of-The-Art and Open Challenges , 2019, ArXiv.

[113]  Gordon K. Smyth,et al.  Generalized Linear Models With Examples in R , 2018 .

[114]  Roberto Santana,et al.  An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers , 2017, Expert Syst. Appl..

[115]  Neil D. Lawrence,et al.  Meta-Surrogate Benchmarking for Hyperparameter Optimization , 2019, NeurIPS.

[116]  Willie Neiswanger,et al.  BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2021, AAAI.

[117]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[118]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[119]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[120]  Bernd Bischl,et al.  ASlib: A benchmark library for algorithm selection , 2015, Artif. Intell..

[121]  Michael Schubert clustermq enables efficient parallelization of genomic analyses , 2019, Bioinform..

[122]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[123]  Thomas Bartz-Beielstein,et al.  Experimental Investigation and Evaluation of Model-based Hyperparameter Optimization , 2021, ArXiv.

[124]  Marius Lindauer,et al.  Pitfalls and Best Practices in Algorithm Configuration , 2017, J. Artif. Intell. Res..

[125]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[126]  Edward O. Pyzer-Knapp,et al.  Dynamic Control of Explore/Exploit Trade-Off In Bayesian Optimization , 2018, Advances in Intelligent Systems and Computing.

[127]  Brian J. Smith,et al.  Feature Engineering and Selection: A Practical Approach for Predictive Models , 2020 .

[128]  Bernd Bischl,et al.  Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features , 2021, ArXiv.

[129]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[130]  Matthias Seeger,et al.  Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning , 2019, NeurIPS.

[131]  Sebastian Nowozin,et al.  Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.

[132]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[133]  Bernd Bischl,et al.  mlr3: A modern object-oriented machine learning framework in R , 2019, J. Open Source Softw..

[134]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[135]  Paolo Frasconi,et al.  Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.

[136]  Bernd Bischl,et al.  Automatic model selection for high-dimensional survival analysis , 2015 .

[137]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[138]  Kevin K Dobbin,et al.  Optimally splitting cases for training and testing high dimensional classifiers , 2011, BMC Medical Genomics.

[139]  Victor S. Sheng,et al.  Thresholding for Making Classifiers Cost-sensitive , 2006, AAAI.

[140]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[141]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[142]  Kevin Leyton-Brown,et al.  Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms , 2006, CP.

[143]  Warren B. Powell,et al.  The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters using Gaussian Process Regression , 2011, SIAM J. Optim..

[144]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[145]  Max Kuhn,et al.  Futility Analysis in the Cross-Validation of Machine Learning Models , 2014, ArXiv.

[146]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[147]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[148]  Bernd Bischl,et al.  BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments , 2015 .

[149]  Thomas Stützle,et al.  F-Race and Iterated F-Race: An Overview , 2010, Experimental Methods for the Analysis of Optimization Algorithms.

[150]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection , 2009, J. Mach. Learn. Res..

[151]  Daniel Hern'andez-Lobato,et al.  Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints , 2016, Neurocomputing.

[152]  Taghi M. Khoshgoftaar,et al.  Survey on categorical data for neural networks , 2020, Journal of Big Data.

[153]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[154]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[155]  Thomas Bäck,et al.  Mixed Integer Evolution Strategies for Parameter Optimization , 2013, Evolutionary Computation.

[156]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[157]  Achim Zeileis,et al.  Partykit: a modular toolkit for recursive partytioning in R , 2015, J. Mach. Learn. Res..

[158]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[159]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[160]  G. Evans,et al.  Learning to Optimize , 2008 .

[161]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[162]  Thomas Stützle,et al.  A Racing Algorithm for Configuring Metaheuristics , 2002, GECCO.

[163]  E. LeDell,et al.  H2O AutoML: Scalable Automatic Machine Learning , 2020 .

[164]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[165]  Ulrich Parlitz,et al.  Gradient based hyperparameter optimization in Echo State Networks , 2019, Neural Networks.

[166]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[167]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[168]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[169]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[170]  M. Daumer,et al.  Evaluating Microarray-based Classifiers: An Overview , 2008, Cancer informatics.

[171]  Bernhard Sendhoff,et al.  Evolution Strategies for Robust Optimization , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[172]  El-Ghazali Talbi,et al.  Optimization of deep neural networks: a survey and unified taxonomy , 2020 .

[173]  Richard Simon,et al.  Resampling Strategies for Model Assessment and Selection , 2007 .

[174]  Michael A. Osborne,et al.  Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces , 2014, 1409.4011.

[175]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[176]  Simon Fong,et al.  Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy , 2018 .

[177]  Daniele Micci-Barreca,et al.  A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems , 2001, SKDD.

[178]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[179]  Luís Torgo,et al.  OpenML: A Collaborative Science Platform , 2013, ECML/PKDD.

[180]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[181]  R. Tibshirani,et al.  Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .

[182]  Philipp Probst,et al.  Hyperparameters and tuning strategies for random forest , 2018, WIREs Data Mining Knowl. Discov..

[183]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[184]  Stephen Roberts,et al.  Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits , 2020, NeurIPS.

[185]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[186]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[187]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[188]  Wenwu Zhu,et al.  Automated Machine Learning on Graphs: A Survey , 2021, IJCAI.

[189]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[190]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[191]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[192]  Li Yang,et al.  On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice , 2020, Neurocomputing.

[193]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[194]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[195]  Rob J. Hyndman,et al.  A note on the validity of cross-validation for evaluating autoregressive time series prediction , 2018, Comput. Stat. Data Anal..

[196]  Bernd Bischl,et al.  MOI-MBO: Multiobjective Infill for Parallel Model-Based Optimization , 2014, LION.

[197]  Suilan Estevez-Velarde,et al.  General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution , 2021, Inf. Sci..

[198]  Kevin Leyton-Brown,et al.  Efficient benchmarking of algorithm configurators via model-based surrogates , 2017, Machine Learning.

[199]  Hao Yu,et al.  State of the Art in Parallel Computing with R , 2009 .

[200]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[201]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[202]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[203]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[204]  Bernd Bischl,et al.  batchtools: Tools for R to work on batch systems , 2017, J. Open Source Softw..

[205]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[206]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[207]  Liang Gao,et al.  A parameterized lower confidence bounding scheme for adaptive metamodel-based design optimization , 2016 .

[208]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[209]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[210]  Kunle Olukotun,et al.  Practical Design Space Exploration , 2018, 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[211]  Isabelle Guyon,et al.  Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[212]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[213]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[214]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[215]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[216]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.