Using AUC and accuracy in evaluating learning algorithms

The area under the ROC (receiver operating characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. We establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.

[1]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[4]  G. Andrews The Theory of Partitions: Frontmatter , 1978, The Mathematical Gazette.

[5]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[6]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[7]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[8]  Kent A. Spackman,et al.  Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning , 1989, ML.

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  G. Andrews On the difference of successive Gaussian polynomials , 1993 .

[13]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[16]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[17]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[18]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[19]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[20]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[23]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[24]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[25]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Johan A. K. Suykens,et al.  Multiclass least squares support vector machines , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[28]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[30]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Charles X. Ling,et al.  Toward Bayesian Classifiers with Accurate Probabilities , 2002, PAKDD.

[32]  Kurt Hornik,et al.  Benchmarking Support Vector Machines , 2002 .

[33]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[34]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[35]  C. Ling,et al.  AUC: a Statistically Consistent and more Discriminating Measure than Accuracy , 2003, IJCAI.

[36]  C. Ling,et al.  Decision Tree with Better Ranking , 2003, ICML.

[37]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[38]  Jingjing Lu,et al.  Comparing naive Bayes, decision trees, and SVM with AUC and accuracy , 2003, Third IEEE International Conference on Data Mining.

[39]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[40]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[41]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[42]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.