Automatic Exploration of Machine Learning Experiments on OpenML

Understanding the influence of hyperparameters on the performance of a machine learning algorithm is an important scientific topic in itself and can help to improve automatic hyperparameter tuning procedures. Unfortunately, experimental meta data for this purpose is still rare. This paper presents a large, free and open dataset addressing this problem, containing results on 38 OpenML data sets, six different machine learning algorithms and many different hyperparameter configurations. Results where generated by an automated random sampling strategy, termed the OpenML Random Bot. Each algorithm was cross-validated up to 20.000 times per dataset with different hyperparameters settings, resulting in a meta dataset of around 2.5 million experiments overall.

[1]  Bernd Bischl,et al.  OpenML: An R package to connect to the machine learning platform OpenML , 2017, Comput. Stat..

[2]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[3]  Tony R. Martinez,et al.  An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage , 2014, MetaSel@ECAI.

[4]  Bernd Bischl,et al.  OpenML Benchmarking Suites and the OpenML100 , 2017, ArXiv.

[5]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[9]  Bernd Bischl,et al.  batchtools: Tools for R to work on batch systems , 2017, J. Open Source Softw..

[10]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[11]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[12]  Bart De Moor,et al.  Hyperparameter Search in Machine Learning , 2015, ArXiv.

[13]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[16]  Bernd Bischl,et al.  OpenML Benchmarking Suites , 2017, 1708.03731.