On learning algorithm selection for classification

This paper introduces a new method for learning algorithm evaluation and selection, with empirical results based on classification. The empirical study has been conducted among 8 algorithms/classifiers with 100 different classification problems. We evaluate the algorithms' performance in terms of a variety of accuracy and complexity measures. Consistent with the No Free Lunch theorem, we do not expect to identify the single algorithm that performs best on all datasets. Rather, we aim to determine the characteristics of datasets that lend themselves to superior modelling by certain learning algorithms. Our empirical results are used to generate rules, using the rule-based learning algorithm C5.0, to describe which types of algorithms are suited to solving which types of classification problems. Most of the rules are generated with a high confidence rating.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  W. Mendenhall,et al.  Statistics for engineering and the sciences , 1984 .

[10]  Vladimir B. Bajic,et al.  Comparing the Success of Different Prediction Software in Sequence Analysis: A Review , 2000, Briefings Bioinform..

[11]  KATE A. SMITH,et al.  Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks , 2001 .

[12]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[13]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[14]  George D. Magoulas,et al.  Neural network-based colonoscopic diagnosis using on-line learning and differential evolution , 2004, Appl. Soft Comput..

[15]  Robert P. W. Duin,et al.  A note on comparing classifiers , 1996, Pattern Recognit. Lett..

[16]  Zhaohong Deng,et al.  Fuzzy kernel hyperball perceptron , 2004, Appl. Soft Comput..

[17]  J. Schmee Applied Statistics—A Handbook of Techniques , 1984 .

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[20]  Victor Ciesielski,et al.  Matching Data Mining Algorithm Suitability to Data Characteristics Using a Self-Organizing Map , 2001, HIS.

[21]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[22]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[25]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  W. Meredith,et al.  Statistics and Data Analysis , 1974 .

[28]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[29]  John L. Craft,et al.  Statistics and Data Analysis for Social Workers , 1990 .

[30]  Donald E. Brown,et al.  A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems , 1992, Pattern Recognit..

[31]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[32]  J. Gower,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[33]  John Mingers,et al.  Neural Networks, Decision Tree Induction and Discriminant Analysis: an Empirical Comparison , 1994 .

[34]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[35]  Ron S. Kenett,et al.  Statistics for Business and Economics. , 1988 .

[36]  P. Utgoff,et al.  Multivariate Versus Univariate Decision Trees , 1992 .

[37]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[38]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[39]  Raymond J. Mooney,et al.  Symbolic and Neural Learning Algorithms: An Experimental Comparison , 1991, Machine Learning.

[40]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[41]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[42]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[43]  G. Constantine,et al.  Applied Statistics: A Handbook of Techniques. , 1985 .

[44]  Lee Shepstone Methods for Statistical Data Analysis of Multivariate Observations, Second Edition , 1998 .

[45]  Ajith Abraham,et al.  Hybrid information systems , 2002 .

[46]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.