Meta-learning for symbolic hyperparameter defaults

Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.

[1]  Paul Davidsson,et al.  Quantifying the Impact of Learning Algorithm Parameter Tuning , 2006, AAAI.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[4]  John R. Koza,et al.  Genetic programming as a means for programming computers by natural selection , 1994 .

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  Jascha Sohl-Dickstein,et al.  Using a thousand optimization tasks to learn hyperparameter search strategies , 2020, ArXiv.

[9]  Xiaodong Li,et al.  A Cooperative Coevolutionary Multiobjective Algorithm Using Non-dominated Sorting , 2004, GECCO.

[10]  Bernd Bischl,et al.  Automatic Exploration of Machine Learning Experiments on OpenML , 2018, ArXiv.

[11]  윤재량 2004 , 2019, The Winning Cars of the Indianapolis 500.

[12]  Jan N. van Rijn,et al.  Don't Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML , 2018, IDA.

[13]  Bernd Bischl,et al.  OpenML Benchmarking Suites and the OpenML100 , 2017, ArXiv.

[14]  A Nguyen,et al.  Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning , 2016, Evolutionary Computation.

[15]  Fela Winkelmolen,et al.  Practical and sample efficient zero-shot HPO , 2020, ArXiv.

[16]  S. Hewitt,et al.  2006 , 2018, Los 25 años de la OMC: Una retrospectiva fotográfica.

[17]  Philipp Probst,et al.  To tune or not to tune the number of trees in random forest? , 2017, J. Mach. Learn. Res..

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Joaquin Vanschoren,et al.  Importance of Tuning Hyperparameters of Machine Learning Algorithms , 2020, ArXiv.

[20]  van,et al.  Massively collaborative machine learning , 2016 .

[21]  A. Azzouz 2011 , 2020, City.

[22]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[23]  Ricardo Vilalta,et al.  Kernel Selection in Support Vector Machines Using Gram-Matrix Properties , 2014 .

[24]  Matthias Feurer Scalable Meta-Learning for Bayesian Optimization using Ranking-Weighted Gaussian Process Ensembles , 2018 .

[25]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[26]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[27]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[28]  Ameet Talwalkar,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[29]  Lars Schmidt-Thieme,et al.  Hyperparameter Search Space Pruning - A New Component for Sequential Model-Based Hyperparameter Optimization , 2015, ECML/PKDD.

[30]  Sylvio Barbon Junior,et al.  Rethinking Default Values: a Low Cost and Efficient Strategy to Define Hyperparameters , 2020, ArXiv.

[31]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[35]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[36]  John R. Koza,et al.  Performance improvement of machine learning via automatic discovery of facilitating functions as applied to a problem of symbolic system identification , 1993, IEEE International Conference on Neural Networks.

[37]  Bernd Bischl,et al.  Meta learning for defaults: symbolic defaults , 2018, ICONIP 2018.

[38]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[39]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[40]  Jeff Clune,et al.  Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multiobjective Evolutionary Algorithm , 2018, Evolutionary Computation.

[41]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[44]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[45]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[46]  Conor Ryan,et al.  Grammatical Evolution , 2001, Genetic Programming Series.

[47]  Marius Lindauer,et al.  Warmstarting of Model-based Algorithm Configuration , 2017, AAAI.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Bernd Bischl,et al.  mlr3: A modern object-oriented machine learning framework in R , 2019, J. Open Source Softw..

[50]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[52]  Bernd Bischl,et al.  Learning multiple defaults for machine learning algorithms , 2018, GECCO Companion.

[53]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[54]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[55]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[56]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..