Organizing the World's Machine Learning Information

All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird’s-eye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Geoff Holmes,et al.  Learning from the Past with Experiment Databases , 2008, PRICAI.

[5]  Saso Dzeroski,et al.  Towards a General Framework for Data Mining , 2006, KDID.

[6]  Francesco Bonchi,et al.  Knowledge Discovery in Inductive Databases, 4th International Workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised Selected and Invited Papers , 2006, KDID.

[7]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[8]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[9]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[10]  Hendrik Blockeel Experiment Databases: A Novel Methodology for Experimental Research , 2005, KDID.

[11]  Amanda Clare,et al.  An ontology for a Robot Scientist , 2006, ISMB.

[12]  C. Ball,et al.  Submission of Microarray Data to Public Repositories , 2004, PLoS biology.

[13]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[14]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  C. Ball,et al.  Microarray databases: standards and ontologies , 2002, Nature Genetics.

[17]  Lloyd Allison,et al.  Models for machine learning and data mining in functional programming , 2004, Journal of Functional Programming.

[18]  Edoardo M. Airoldi,et al.  Statistical Network Analysis: Models, Issues, and New Directions - ICML 2006 Workshop on Statistical Network Analysis, Pittsburgh, PA, USA, June 29, 2006, Revised Selected Papers , 2007, SNA@ICML.

[19]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.