STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS

This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from symbolic learning (CART. C4.5, NewID, AC2,ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the data set investigated. We therefore developed a set of data set descriptors to help decide which algorithms are suited to particular data sets. For example, data sets with extreme distributions (skew > l and kurtosis > 7) and with many binary/categorical attributes (>38%) tend to favor symbolic learning algorithms. We suggest how classification algorithms can be extended in a number of d...

[1]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[2]  Raymond J. Mooney,et al.  Processing Issues in Comparisons of Symbolic and Connectionist Learning Systems , 1989, ML.

[3]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[4]  Wray L. Buntine Learning Classification Rules Using Bayes , 1989, ML.

[5]  D. J. Spiegelhalter Probabilistic methods and the interface with statistics , 1990 .

[6]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[7]  Raymond J. Mooney,et al.  An Experimental Comparison of Symbolic and Connectionist Learning Algorithms , 1989, IJCAI.

[8]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[9]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[10]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[11]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[13]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[14]  U. Kressel The Impact of the Learning–Set Size in Handwritten–Digit Recognition , 1991 .

[15]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[16]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[17]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[18]  J. Habbema,et al.  Cases of doubt in allocation problems , 1974 .

[19]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[20]  Erkki Oja,et al.  Neural Nets for Dual Subspace Pattern Recognition Method , 1991, Int. J. Neural Syst..

[21]  Robert J. Marks,et al.  A performance comparison of trained multilayer perceptrons and trained classification trees , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[22]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[23]  Daesik Hong,et al.  Parallel, Self-Organizing, Hierarchical Neural Networks for Vision and Systems Control , 1990, Proceedings of the IEEE International Workshop on Intelligent Motion Control.

[24]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[25]  Louis Wehenkel,et al.  Decision trees for detecting emergency voltage conditions , 1991 .

[26]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[27]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[28]  Ishwar K. Sethi,et al.  Comparison between entropy net and decision tree classifiers , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[29]  L. Spirkovska,et al.  An empirical comparison of ID3 and HONNs for distortion invariant object recognition , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[30]  Claude Sammut,et al.  Experimental Results from an Evaluation of Algorithms that Learn to Control Dynamic Systems , 1988, ML.

[31]  C. A. Kirkwood,et al.  Automatic detection of gait events: a case study using inductive learning techniques. , 1989, Journal of biomedical engineering.

[32]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[33]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[34]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[35]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[36]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[37]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[38]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[39]  J. Remme,et al.  A simulative comparison of linear, quadratic and kernel discrimination , 1980 .

[40]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[41]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[42]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[43]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[44]  Alexandre Parodi,et al.  An Efficient Classifier System and Its Experimental Comparison with Two Representative Learning Methods on Three Medical Domains , 1991, ICGA.

[45]  Alberto Maria Segre Proceedings of the sixth international workshop on Machine learning , 1989 .