A Comprehensive Dataset for Evaluating Approaches of Various Meta-learning Tasks

New approaches in pattern recognition are typically evaluated against standard datasets, e.g. from UCI or StatLib. Using the same and publicly available datasets increases the comparability and reproducibility of evaluations. In the field of meta-learning, the actual dataset for evaluation is created based on multiple other datasets. Unfortunately, no comprehensive dataset for meta-learning is currently publicly available. In this paper, we present a novel and publicly available dataset for meta-learning based on 83 datasets, six classification algorithms, and 49 meta-features. Different target variables like accuracy and training time of the classifiers as well as parameter dependent measures are included as ground-truth information. Therefore, the meta-dataset can be used for various meta-learning tasks, e.g. predicting the accuracy and training time of classifiers or predicting the optimal parameter values. Using the presented meta-dataset, a convincing and comparable evaluation of new meta-learning approaches is possible.

[1]  SohnSo Young Meta Analysis of Classification Algorithms for Pattern Recognition , 1999 .

[2]  Andreas Dengel,et al.  Prediction of Classifier Training Time Including Parameter Optimization , 2011, KI.

[3]  Charles C. Taylor,et al.  Meta-Analysis: From Data Characterisation for Meta-Learning to Meta-Regression , 2000 .

[4]  Robert Engels,et al.  Using a Data Metric for Preprocessing Advice for Data Mining Applications , 1998, ECAI.

[5]  María N. Moreno García,et al.  Information-Theoretic Measures for Meta-learning , 2008, HAIS.

[6]  Carlos Soares,et al.  Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features , 2006, SAC '06.

[7]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[8]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.

[9]  Peter A. Flach,et al.  Improved Dataset Characterisation for Meta-learning , 2002, Discovery Science.

[10]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[11]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[12]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[13]  C. Giraud-Carrier Casa Batl O Is in Passeig De Gr Acia or How Landmark Performances Can Describe Tasks , 2000 .

[14]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[15]  Hilan Bensusan,et al.  Discovering Task Neighbourhoods Through Landmark Learning Performances , 2000, PKDD.

[16]  So Young Sohn,et al.  Meta Analysis of Classification Algorithms for Pattern Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[18]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.