About the relationship between ROC curves and Cohen's kappa

Receiver operating characteristic (ROC) curves are very powerful tools for measuring classifiers' accuracy in binary-class problems. However, their usefulness in real-world multi-class problems has not been demonstrated yet. In these frequently occurring multi-class cases, simple accuracy meters that do compensate for random successes, such as the kappa statistic, are needed. ROC curves are two-dimensional graphs. Kappa is a scalar. Each comes from an entirely different discipline. This research investigates whether they do have anything in common. A mathematical formulation that links ROC spaces with the kappa statistic is derived here for the first time. The understanding of how these two accuracy meters relate to each other can assist in a better understanding of their respective pros and cons.

[1]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[2]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[3]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[4]  P. Heckerling Parametric Three-Way Receiver Operating Characteristic Surface Analysis Using Mathematica , 2001, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[7]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[8]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[9]  A. Feinstein,et al.  High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.

[10]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[11]  D. Mossman Three-way ROCs , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  Ingo Mierswa,et al.  YALE: Yet Another Learning Environment , 2003 .

[13]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[16]  Arie Ben-David,et al.  A lot of randomness is hiding in accuracy , 2007, Eng. Appl. Artif. Intell..

[17]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.