OpenML: An R package to connect to the machine learning platform OpenML

OpenML is an online machine learning platform where researchers can easily share data, machine learning tasks and experiments as well as organize them online to work and collaborate more efficiently. In this paper, we present an R package to interface with the OpenML platform and illustrate its usage in combination with the machine learning R package mlr (Bischl et al. J Mach Learn Res 17(170):1–5, 2016). We show how the OpenML package allows R users to easily search, download and upload data sets and machine learning tasks. Furthermore, we also show how to upload results of experiments, share them with others and download results from other users. Beyond ensuring reproducibility of results, the OpenML platform automates much of the drudge work, speeds up research, facilitates collaboration and increases the users’ visibility online.

[1]  Hendrik Blockeel,et al.  A new way to share, organize and learn from experiments , 2012 .

[2]  Geoff Holmes,et al.  Experiment databases , 2012, Machine Learning.

[3]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[4]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Steven L. Goldman Reinventing Discovery: The New Era of Networked Science , 2014 .

[7]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[8]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[9]  Bernd Bischl,et al.  batchtools: Tools for R to work on batch systems , 2017, J. Open Source Softw..

[10]  Luís Torgo,et al.  OpenML: A Collaborative Science Platform , 2013, ECML/PKDD.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Jan N. van Rijn,et al.  Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML , 2016, IDA.

[13]  Bernd Bischl,et al.  mlr Tutorial , 2016, ArXiv.

[14]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[15]  Bernd Bischl,et al.  Automatic model selection for high-dimensional survival analysis , 2015 .

[16]  J. Carpenter May the best analyst win. , 2011, Science.

[17]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[18]  Michael A. Nielsen,et al.  Reinventing Discovery: The New Era of Networked Science , 2011 .

[19]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[20]  Bernd Bischl,et al.  Multilabel Classification with R Package mlr , 2017, R J..

[21]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[24]  Luís Torgo,et al.  A RapidMiner extension for open machine learning , 2013 .

[25]  Bernd Bischl,et al.  BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments , 2015 .