A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers

Abstract For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a recommender system based on meta-learning to identify exactly when it is better to use default values and when to tune hyperparameters for each new dataset. Besides, an in-depth analysis is performed to understand what they take into account for their decisions, providing useful insights. An extensive analysis of different categories of meta-features, meta-learners, and setups across 156 datasets is performed. Results show that it is possible to accurately predict when tuning will significantly improve the performance of the induced models. The proposed system reduces the time spent on optimization processes, without reducing the predictive performance of the induced models (when compared with the ones obtained using tuned hyperparameters). We also explain the decision-making process of the meta-learners in terms of linear separability-based hypotheses. Although this analysis is focused on the tuning of Support Vector Machines, it can also be applied to other algorithms, as shown in experiments performed with decision trees.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Hilan Bensusan,et al.  A Higher-order Approach to Meta-learning , 2000, ILP Work-in-progress reports.

[3]  Lars Schmidt-Thieme,et al.  Scalable Gaussian process-based transfer surrogates for hyperparameter optimization , 2017, Machine Learning.

[4]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Using Genetic Algorithms to Improve Prediction of Execution Times of ML Tasks , 2012, HAIS.

[5]  D. Sheskin The Pearson Product-Moment Correlation Coefficient , 2003 .

[6]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[7]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[8]  Carvalho Andre C.P.L.F. de,et al.  Hyper-Parameter Tuning of a Decision Tree Induction Algorithm , 2016 .

[9]  Andreas Dengel,et al.  Automatic classifier selection for non-experts , 2012, Pattern Analysis and Applications.

[10]  Ljubomir J. Buturovic,et al.  Cross-validation pitfalls when selecting and assessing regression and classification models , 2014, Journal of Cheminformatics.

[11]  Andreas Dengel,et al.  Meta-learning for evolutionary parameter optimization of classifiers , 2012, Machine Learning.

[12]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[13]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[14]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Carlos Soares,et al.  Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features , 2006, SAC '06.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Kevin Leyton-Brown,et al.  Efficient benchmarking of algorithm configurators via model-based surrogates , 2017, Machine Learning.

[19]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[20]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[22]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[23]  Christophe G. Giraud-Carrier,et al.  Using Metalearning to Predict When Parameter Optimization Is Likely to Improve Classification Accuracy , 2014, MetaSel@ECAI.

[24]  Ricardo B. C. Prudêncio,et al.  Fine-tuning of support vector machine parameters using racing algorithms , 2014, ESANN.

[25]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  An Experimental Study of the Combination of Meta-Learning with Particle Swarm Algorithms for SVM Parameter Selection , 2012, ICCSA.

[26]  Bernd Bischl,et al.  Effectiveness of Random Search in SVM hyper-parameter tuning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[27]  Martín Carpio,et al.  Hyper-Parameter Tuning for Support Vector Machines by Estimation of Distribution Algorithms , 2017, Nature-Inspired Design of Hybrid Intelligent Systems.

[28]  Christophe G. Giraud-Carrier,et al.  Informing the Use of Hyperparameter Optimization Through Metalearning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[29]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[30]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[33]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[34]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Noise detection in the meta-learning level , 2016, Neurocomputing.

[35]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[36]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[37]  S. Schiffer,et al.  ANALYSIS OF THE RESULTS , 1971 .

[38]  Andreas Dengel,et al.  Prediction of Classifier Training Time Including Parameter Optimization , 2011, KI.

[39]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Meta-learning Recommendation of Default Hyper-parameter Values for SVMs in Classification Tasks , 2015, MetaSel@PKDD/ECML.

[40]  Bernd Bischl,et al.  A comparative study on large scale kernelized support vector machines , 2016, Adv. Data Anal. Classif..

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  Ronaldo C. Prati,et al.  Complex Network Measures for Data Set Characterization , 2013, 2013 Brazilian Conference on Intelligent Systems.

[43]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[46]  Florian Skopik,et al.  A Public-Private-Partnership Model for National Cyber Situational Awareness , 2016, Int. J. Cyber Situational Aware..

[47]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[48]  Ricardo B. C. Prudêncio,et al.  Active testing for SVM parameter selection , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[49]  Ana Carolina Lorena,et al.  Data complexity meta-features for regression problems , 2017, Machine Learning.

[50]  Bernd Bischl,et al.  To tune or not to tune: Recommending when to adjust SVM hyper-parameters via meta-learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[51]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[52]  Bernd Bischl,et al.  OpenML: An R package to connect to the machine learning platform OpenML , 2017, Comput. Stat..