Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment

The family of Kappa indices of agreement claim to compare a map's observed classification accuracy relative to the expected accuracy of baseline maps that can have two types of randomness: (1) random distribution of the quantity of each category and (2) random spatial allocation of the categories. Use of the Kappa indices has become part of the culture in remote sensing and other fields. This article examines five different Kappa indices, some of which were derived by the first author in 2000. We expose the indices' properties mathematically and illustrate their limitations graphically, with emphasis on Kappa's use of randomness as a baseline, and the often-ignored conversion from an observed sample matrix to the estimated population matrix. This article concludes that these Kappa indices are useless, misleading and/or flawed for the practical applications in remote sensing that we have seen. After more than a decade of working with these indices, we recommend that the profession abandon the use of Kappa indices for purposes of accuracy assessment and map comparison, and instead summarize the cross-tabulation matrix with two much simpler summary parameters: quantity disagreement and allocation disagreement. This article shows how to compute these two parameters using examples taken from peer-reviewed literature.

[1]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  L. A. Goodman,et al.  Measures of Association for Cross Classifications III: Approximate Sampling Theory , 1963 .

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[6]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[7]  R. G. Oderwald,et al.  Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. , 1983 .

[8]  G. H. Rosenfield,et al.  A coefficient of agreement as a measure of thematic classification accuracy. , 1986 .

[9]  Carl W. Ramm,et al.  Correct Formation of the Kappa Coefficient of Agreement , 1987 .

[10]  M. Aickin Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa. , 1990, Biometrics.

[11]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[12]  Giles M. Foody,et al.  On the compensation for chance agreement in image classification accuracy assessment, Photogram , 1992 .

[13]  R. Leemans,et al.  Comparing global vegetation maps with the Kappa statistic , 1992 .

[14]  Zhenkui Ma,et al.  Tau coefficients for accuracy assessment of classification of remote sensing data , 1995 .

[15]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[16]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[17]  Stephen V. Stehman,et al.  Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles , 1998 .

[18]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[19]  P. C. Smits,et al.  QUALITY ASSESSMENT OF IMAGE CLASSIFICATION ALGORITHMS FOR LAND-COVER MAPPING , 1999 .

[20]  R. Pontius QUANTIFICATION ERROR VERSUS LOCATION ERROR IN COMPARISON OF CATEGORICAL MAPS , 2000 .

[21]  R. G. Pontius,et al.  Modeling land-use change in the Ipswich watershed, Massachusetts, USA , 2001 .

[22]  Giles M. Foody,et al.  Status of land cover classification accuracy assessment , 2002 .

[23]  A Hagen,et al.  Multi-method assessment of map similarity , 2002 .

[24]  R. G. Pontius Statistical Methods to Partition Effects of Quantity and Location During Comparison of Categorical Maps at Multiple Resolutions , 2002 .

[25]  Ho-Won Jung,et al.  Evaluating interrater agreement in SPICE-based assessments , 2003, Comput. Stand. Interfaces.

[26]  Aditya Agrawal,et al.  Estimating the uncertainty of land-cover extrapolations while constructing a raster map from tabular data , 2003, J. Geogr. Syst..

[27]  R. Gil Pontius,et al.  Components of Agreement between Categorical Maps at Multiple Resolutions , 2004 .

[28]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[29]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[30]  R. G. Pontius,et al.  Detecting important categorical land changes while accounting for persistence , 2004 .

[31]  Graeme G. Wilkinson,et al.  Results and implications of a study of fifteen years of satellite image classification experiments , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Robert Gilmore Pontius,et al.  Uncertainty in Extrapolations of Predictive Land-Change Models , 2005 .

[33]  Hans Visser,et al.  The Map Comparison Kit , 2006, Environ. Model. Softw..

[34]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[35]  R. G. Pontlus Quantification Error Versus Location Error in Comparison of Categorical Maps , 2006 .

[36]  R. Pontius,et al.  Accuracy Assessment for a Simulation Model of Amazonian Deforestation , 2007 .

[37]  Hao Chen,et al.  Components of information for multiple resolution comparison between maps that share a real variable , 2008, Environmental and Ecological Statistics.

[38]  Lalit Kumar,et al.  Comparative assessment of the measures of thematic classification accuracy , 2007 .

[39]  J. Löffler,et al.  High‐resolution spatial analysis of mountain landscapes using a low‐altitude remote sensing approach , 2008 .

[40]  D. Ruelland,et al.  Long‐term monitoring of land cover changes based on Landsat imagery to improve hydrological modelling in West Africa , 2008 .

[41]  Giles M. Foody,et al.  Harshness in image classification accuracy assessment , 2008 .

[42]  Eric Koomen,et al.  Comparing the input, output, and validation maps for several models of land change , 2008 .

[43]  R. Pontius,et al.  Identifying Systematic Land-Cover Transitions Using Remote Sensing and GIS: The Fate of Forests inside and outside Protected Areas of Southwestern Ghana , 2008 .

[44]  E. Lynn Usery,et al.  Using Geometrical, Textural, and Contextual Information of Land Parcels for Classification of Detailed Urban Land Use , 2009 .

[45]  Stephen V. Stehman,et al.  Sampling designs for accuracy assessment of land cover , 2009 .

[46]  Tarmo K. Remmel,et al.  Investigating Global and Local Categorical Map Configuration Comparisons Based on Coincidence Matrices , 2009 .