An Iterative Process for Building Learning Curves and Predicting Relative Performance of Classifiers

This paper concerns the problem of predicting the relative performance of classification algorithms. Our approach requires that experiments are conducted on small samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was fully carried out. This allows the generation of a prediction regarding the relative performance of the algorithms. The method automatically establishes how many samples are needed and their sizes. This is done iteratively by taking into account the results of all previous experiments - both on other datasets and on the new dataset obtained so far. Experimental evaluation has shown that the method achieves better performance than previous approaches.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[3]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[4]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[5]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[6]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[9]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[10]  Hilan Bensusan,et al.  Discovering Task Neighbourhoods Through Landmark Learning Performances , 2000, PKDD.

[11]  Johannes Fürnkranz,et al.  An Evaluation of Landmarking Variants , 2001 .

[12]  Carlos Soares,et al.  Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms Before Choosing , 2001, EPIA.

[13]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[16]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[17]  Pavel Brazdil,et al.  Predicting relative performance of classifiers from samples , 2005, ICML '05.