OpenML: networked science in machine learning

Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems. We discuss how OpenML relates to other examples of networked science and what benefits it brings for machine learning research, individual scientists, as well as students and practitioners.

[1]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[2]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[3]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[4]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[5]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[6]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[7]  Walter Daelemans,et al.  Comparing Learning Approaches to Coreference Resolution. There is More to it Than 'Bias' , 2005, ICML 2005.

[8]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[9]  A.R. Isern The Ocean Observatories Initiative: Wiring the Ocean for Interactive Scientific Discovery , 2006, OCEANS 2006.

[10]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[11]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[12]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[13]  Carl E. Rasmussen,et al.  The Need for Open Source Software in Machine Learning , 2007, J. Mach. Learn. Res..

[14]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[15]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[16]  Ted Pedersen,et al.  Empiricism Is Not a Matter of Faith , 2008, Computational Linguistics.

[17]  Ioana Manolescu,et al.  The repeatability experiment of SIGMOD 2008 , 2008, SGMD.

[18]  Florence Heath SEX, SCIENCE AND PROFITS ‐ By Terence Kealey , 2008 .

[19]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[20]  Haym Hirsh Data Mining Research: Current Status and Future Opportunities , 2008, Stat. Anal. Data Min..

[21]  T. Boroson,et al.  A candidate sub-parsec supermassive binary black hole system , 2009, Nature.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[24]  C. Lintott,et al.  Galaxy Zoo Green Peas: discovery of a class of compact extremely star-forming galaxies , 2009, 0907.4155.

[25]  C. Lintott,et al.  Galaxy Zoo: 'Hanny's Voorwerp', a quasar light echo? , 2009, 0906.5304.

[26]  C. Lintott,et al.  Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. , 2009, 0909.2925.

[27]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[28]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[29]  C. Lintott,et al.  Galaxy Zoo: Passive Red Spirals . , 2009, 0910.4113.

[30]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[31]  Sebastian Stawicki,et al.  TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments , 2010, RSCTC.

[32]  Geoff Holmes,et al.  Experiment databases , 2012, Machine Learning.

[33]  J. Carpenter May the best analyst win. , 2011, Science.

[34]  Hendrik Blockeel,et al.  A new way to share, organize and learn from experiments , 2012 .

[35]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[36]  C. Lintott,et al.  PLANET HUNTERS: ASSESSING THE KEPLER INVENTORY OF SHORT-PERIOD PLANETS , 2012, 1205.6769.

[37]  Paul Groth,et al.  The Altmetrics Collection , 2012, PloS one.

[38]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[39]  Luís Torgo,et al.  OpenML: A Collaborative Science Platform , 2013, ECML/PKDD.

[40]  Luís Torgo,et al.  A RapidMiner extension for open machine learning , 2013 .

[41]  E. Ostrom Collective action and the evolution of social norms , 2000, Journal of Economic Perspectives.

[42]  Steven L. Goldman Reinventing Discovery: The New Era of Networked Science , 2014 .