AUC: A Better Measure than Accuracy in Comparing Learning Algorithms

Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the classification, but they are completely ignored in the accuracy measure. This is often taken for granted because both training and testing sets only provide class labels. In this paper we establish rigourously that, even in this setting, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, provides a better measure than accuracy. Our result is quite significant for three reasons. First, we establish, for the first time, rigourous criteria for comparing evaluation measures for learning algorithms. Second, it suggests that AUC should replace accuracy when measuring and comparing classification systems. Third, our result also prompts us to reevaluate many well-established conclusions based on accuracy in machine learning. For example, it is well accepted in the machine learning community that, in terms of predictive accuracy, Naive Bayes and decision trees are very similar. Using AUC, however, we show experimentally that Naive Bayes is significantly better than the decision-tree learning algorithms.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[3]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[4]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[5]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[6]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[7]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[8]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[9]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[10]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[11]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  P. Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[14]  Brian R. Gaines,et al.  Current Trends in Knowledge Acquisition , 1990 .

[15]  Charles X. Ling,et al.  Toward Bayesian Classifiers with Accurate Probabilities , 2002, PAKDD.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.