Learning multiple defaults for machine learning algorithms

The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization.

[1]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[2]  F. Hutter,et al.  Practical Automated Machine Learning for the AutoML Challenge 2018 , 2018 .

[3]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[4]  J. L. Hodges,et al.  The use of Previous Experience in Reaching Statistical Decisions , 1952 .

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[12]  P. Alam ‘E’ , 2021, Composites Engineering: An A–Z Guide.

[13]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[14]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[15]  van,et al.  Massively collaborative machine learning , 2016 .

[16]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[17]  Ameet Talwalkar,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[18]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[19]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[20]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[21]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[22]  Lars Schmidt-Thieme,et al.  Sequential Model-Free Hyperparameter Tuning , 2015, 2015 IEEE International Conference on Data Mining.

[23]  Paul Davidsson,et al.  Quantifying the Impact of Learning Algorithm Parameter Tuning , 2006, AAAI.

[24]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  João Mendes-Moreira,et al.  Towards Automatic Generation of Metafeatures , 2016, PAKDD.

[27]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[28]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[29]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[30]  Bernd Bischl,et al.  OpenML Benchmarking Suites and the OpenML100 , 2017, ArXiv.

[31]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[32]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[33]  P. Alam ‘L’ , 2021, Composites Engineering: An A–Z Guide.

[34]  P. Alam ‘Z’ , 2021, Composites Engineering: An A–Z Guide.

[35]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[36]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[37]  J. N. Rijn,et al.  OpenML Benchmarking Suites , 2017, NeurIPS Datasets and Benchmarks.

[38]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[39]  G. Fitzgerald,et al.  'I. , 2019, Australian journal of primary health.

[40]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[41]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[42]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[43]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[44]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[45]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..