Comparing naive Bayes, decision trees, and SVM with AUC and accuracy

Predictive accuracy has often been used as the main and often only evaluation criterion for the predictive performance of classification or data mining algorithms. In recent years, the area under the ROC (receiver operating characteristics) curve, or simply AUC, has been proposed as an alternative single-number measure for evaluating performance of learning algorithms. We proved that AUC is, in general, a better measure (defined precisely) than accuracy. Many popular data mining algorithms should then be reevaluated in terms of AUC. For example, it is well accepted that Naive Bayes and decision trees are very similar in accuracy. How do they compare in AUC? Also, how does the recently developed SVM (support vector machine) compare to traditional learning algorithms in accuracy and AUC? We will answer these questions. Our conclusions will provide important guidelines in data mining applications on real-world datasets.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[3]  A. Dale Magoun,et al.  Decision, estimation and classification , 1989 .

[4]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[5]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Charles X. Ling,et al.  Toward Bayesian Classifiers with Accurate Probabilities , 2002, PAKDD.

[7]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[8]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[9]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[10]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Kurt Hornik,et al.  Benchmarking Support Vector Machines , 2002 .

[13]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[14]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[15]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[16]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[17]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[18]  Johan A. K. Suykens,et al.  Multiclass least squares support vector machines , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[19]  C. W. Therrien,et al.  Decision, Estimation and Classification: An Introduction to Pattern Recognition and Related Topics , 1989 .

[20]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[22]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[23]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .