Automatic classifier selection for non-experts

Choosing a suitable classifier for a given dataset is an important part of developing a pattern recognition system. Since a large variety of classification algorithms are proposed in literature, non-experts do not know which method should be used in order to obtain good classification results on their data. Meta-learning tries to address this problem by recommending promising classifiers based on meta-features computed from a given dataset. In this paper, we empirically evaluate five different categories of state-of-the-art meta-features for their suitability in predicting classification accuracies of several widely used classifiers (including Support Vector Machines, Neural Networks, Random Forests, Decision Trees, and Logistic Regression). Based on the evaluation results, we have developed the first open source meta-learning system that is capable of accurately predicting accuracies of target classifiers. The user provides a dataset as input and gets an automatically created high-performance ready-to-use pattern recognition system in a few simple steps. A user study of the system with non-experts showed that the users were able to develop more accurate pattern recognition systems in significantly less development time when using our system as compared to using a state-of-the-art data mining software.

[1]  Peter A. Flach,et al.  Improved Dataset Characterisation for Meta-learning , 2002, Discovery Science.

[2]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[3]  Hilan Bensusan,et al.  A Higher-order Approach to Meta-learning , 2000, ILP Work-in-progress reports.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Melanie Hilario,et al.  Feature Selection for Meta-learning , 2001, PAKDD.

[6]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[7]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[8]  Robert Engels,et al.  Using a Data Metric for Preprocessing Advice for Data Mining Applications , 1998, ECAI.

[9]  María N. Moreno García,et al.  Information-Theoretic Measures for Meta-learning , 2008, HAIS.

[10]  Markus Reischl,et al.  Data mining tools , 2011, WIREs Data Mining Knowl. Discov..

[11]  Faisal Shafait,et al.  Landmarking for Meta-Learning using RapidMiner , 2010 .

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Carlos Soares,et al.  Report on the Experiments with Feature Selection in Meta-Level Learning , 2000 .

[14]  C. Giraud-Carrier Casa Batl O Is in Passeig De Gr Acia or How Landmark Performances Can Describe Tasks , 2000 .

[15]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[16]  Thomas M. Breuel,et al.  A Bayes-true data generator for evaluation of supervised and unsupervised learning methods , 2011, Pattern Recognit. Lett..

[17]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[18]  Melanie Hilario,et al.  Ontology-Based Meta-Mining of Knowledge Discovery Workflows , 2011, Meta-Learning in Computational Intelligence.

[19]  Christophe G. Giraud-Carrier,et al.  The data mining advisor: meta-learning at the service of practitioners , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[20]  Abraham Bernstein,et al.  Data mining workflow templates for intelligent discovery assistance and auto-experimentation , 2010 .

[21]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[22]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[23]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[24]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[25]  Hilan Bensusan,et al.  Discovering Task Neighbourhoods Through Landmark Learning Performances , 2000, PKDD.

[26]  Charles C. Taylor,et al.  Meta-Analysis: From Data Characterisation for Meta-Learning to Meta-Regression , 2000 .

[27]  ReifMatthias,et al.  Automatic classifier selection for non-experts , 2014 .

[28]  Rudi Studer,et al.  AST: Support for Algorithm Selection with a CBR Approach , 1999, PKDD.

[29]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Johannes Fürnkranz,et al.  An Evaluation of Landmarking Variants , 2001 .

[32]  Jianhua Z. Huang,et al.  Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data , 2009 .

[33]  J. Friedman Regularized Discriminant Analysis , 1989 .

[34]  So Young Sohn,et al.  Meta Analysis of Classification Algorithms for Pattern Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[36]  SohnSo Young Meta Analysis of Classification Algorithms for Pattern Recognition , 1999 .

[37]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[38]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .

[39]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[40]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.