Learning from the Past with Experiment Databases

Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, often losing detailed results after publication. Yet, when collecting all these past experiments in experiment databases, they can readily be reused for additional and possibly much broader investigation. In this paper, we make use of such a database to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons of algorithms, we also investigate the effects of algorithm parameters and data properties, and seek deeper insights into the behavior of learning algorithms by studying their learning curves and bias-variance profiles.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  M. Hilario,et al.  Building algorithm profiles for prior model selection in knowledge discovery systems , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Constantine D. Spyropoulos,et al.  Machine Learning and Its Applications , 2001, Lecture Notes in Computer Science.

[5]  Maarten van Someren Model Class Selection and Construction: Beyond the Procrustean Approach to Machine Learning Applications , 2001, Machine Learning and Its Applications.

[6]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[7]  Geoffrey I. Webb,et al.  The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[8]  Peter A. Flach,et al.  Improved Dataset Characterisation for Meta-learning , 2002, Discovery Science.

[9]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[10]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[11]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[12]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[13]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[14]  Joaquin Vanschoren Investigating Learning Behavior with Experiment Databases , 2007 .

[15]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[16]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[17]  Geoff Holmes,et al.  Organizing the World's Machine Learning Information , 2008, ISoLA.