Cohen's kappa is a weighted average

Abstract The κ coefficient is a popular descriptive statistic for summarizing an agreement table. It is sometimes desirable to combine some of the categories, for example, when categories are easily confused, and then calculate κ for the collapsed table. Since the categories of an agreement table are nominal and the order in which the categories of a table are listed is irrelevant, combining categories of an agreement table is identical to partitioning the categories in subsets. In this paper we prove that given a partition type of the categories, the overall κ -value of the original table is a weighted average of the κ -values of the collapsed tables corresponding to all partitions of that type. The weights are the denominators of the kappas of the subtables. An immediate consequence is that Cohen’s κ can be interpreted as a weighted average of the κ -values of the agreement tables corresponding to all non-trivial partitions. The κ -value of the 2 × 2 table that is obtained by combining all categories other than the one of current interest into a single “all others” category, reflects the reliability of the individual category. Since the overall κ -value is a weighted average of these 2 × 2 κ -values the category reliability indicates how a category contributes to the overall κ -value. It would be good practice to report both the overall κ -value and the category reliabilities of an agreement table.

[1]  Matthijs J. Warrens,et al.  A Formal Proof of a Paradox Associated with Cohen’s Kappa , 2010, J. Classif..

[2]  M. Warrens On Association Coefficients for 2×2 Tables and Properties That Do Not Depend on the Marginal Distributions , 2008, Psychometrika.

[3]  Leo A. Goodman,et al.  Corrigenda: Measures of Association for Cross Classifications , 1957 .

[4]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[5]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[8]  Adelin Albert,et al.  Agreement between an isolated rater and a group of raters , 2009 .

[9]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[10]  Matthijs J. Warrens Inequalities Between Kappa and Kappa-Like Statistics for k×k Tables , 2010 .

[11]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[12]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[13]  Matthijs J. Warrens,et al.  Weighted kappa is higher than Cohen's kappa for tridiagonal agreement tables , 2011 .

[14]  Hans Visser,et al.  The Map Comparison Kit , 2006, Environ. Model. Softw..

[15]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[16]  Matthijs J. Warrens,et al.  A Kraemer-type Rescaling that Transforms the Odds Ratio into the Weighted Kappa Coefficient , 2010 .

[17]  J C Nelson,et al.  Statistical description of interrater variability in ordinal ratings , 2000, Statistical methods in medical research.

[18]  H. Kundel,et al.  Measurement of observer agreement. , 2003, Radiology.

[19]  Hubert J. A. Schouten,et al.  Nominal scale agreement among observers , 1986 .

[20]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[21]  Matthijs J. Warrens,et al.  Inequalities between multi-rater kappas , 2010, Adv. Data Anal. Classif..

[22]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[23]  Helena C. Kraemer,et al.  TUTORIAL IN BIOSTATISTICS Kappa coecients in medical research , 2004 .

[24]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[25]  Adelin Albert,et al.  Agreement between Two Independent Groups of Raters , 2009 .

[26]  Matthijs J. Warrens,et al.  Cohen's kappa can always be increased and decreased by combining categories , 2010 .

[27]  James C. Reed Book Reviews : Visual Perceptual Abilities and Early Reading Progress by Jean Turner Goins, Supplementary Educational Monographs, #87, Chicago: University of Chicago Press, 1958, Pp. x + 108 , 1960 .

[28]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[29]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[30]  Michael Z. Spivey A Generalized Recurrence for Bell Numbers , 2008 .

[31]  L. Hsu,et al.  Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α , 2003 .

[32]  Rebecca Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[33]  A. Agresti An introduction to categorical data analysis , 1997 .

[34]  M. Warrens On Similarity Coefficients for 2×2 Tables and Correction for Chance , 2008, Psychometrika.

[35]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[36]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[37]  Adelin Albert,et al.  A note on the linearly weighted kappa coefficient for ordinal scales , 2009 .