Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas

An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.

[1]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[2]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[3]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[4]  J C Nelson,et al.  Statistical description of interrater variability in ordinal ratings , 2000, Statistical methods in medical research.

[5]  Matthijs J. Warrens,et al.  Inequalities between multi-rater kappas , 2010, Adv. Data Anal. Classif..

[6]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[7]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[8]  Kenneth J. Berry,et al.  The Exact Variance of Weighted Kappa with Multiple Raters , 2007, Psychological reports.

[9]  Adelin Albert,et al.  Agreement between an isolated rater and a group of raters , 2009 .

[10]  Matthijs J. Warrens,et al.  Cohen's kappa can always be increased and decreased by combining categories , 2010 .

[11]  T. Allison,et al.  A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings , 1971 .

[12]  James C. Reed Book Reviews : Visual Perceptual Abilities and Early Reading Progress by Jean Turner Goins, Supplementary Educational Monographs, #87, Chicago: University of Chicago Press, 1958, Pp. x + 108 , 1960 .

[13]  Hubert J. A. Schouten,et al.  Nominal scale agreement among observers , 1986 .

[14]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[15]  Kenneth J. Berry,et al.  A note on Cohen’s weighted kappa coefficient of agreement with linear weights , 2009 .

[16]  Albert Westergren,et al.  Statistical methods for assessing agreement for ordinal data. , 2005, Scandinavian journal of caring sciences.

[17]  Ulf Olsson,et al.  A Measure of Agreement for Interval or Nominal Multivariate Observations , 2001 .

[18]  Matthijs J. Warrens,et al.  A Kraemer-type Rescaling that Transforms the Odds Ratio into the Weighted Kappa Coefficient , 2010 .

[19]  H. Kundel,et al.  Measurement of observer agreement. , 2003, Radiology.

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[22]  Rebecca Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[23]  Adelin Albert,et al.  A note on the linearly weighted kappa coefficient for ordinal scales , 2009 .

[24]  Janis E. Johnston,et al.  Resampling Probability Values for Weighted Kappa with Multiple Raters , 2008, Psychological reports.

[25]  Hans Visser,et al.  The Map Comparison Kit , 2006, Environ. Model. Softw..

[26]  P. Mielke,et al.  A Generalization of Cohen's Kappa Agreement Measure to Interval Measurement and Multiple Raters , 1988 .

[27]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[28]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[29]  Matthijs J. Warrens Inequalities Between Kappa and Kappa-Like Statistics for k×k Tables , 2010 .

[30]  Matthijs J. Warrens,et al.  Weighted kappa is higher than Cohen's kappa for tridiagonal agreement tables , 2011 .

[31]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[32]  Matthijs J. Warrens,et al.  A Formal Proof of a Paradox Associated with Cohen’s Kappa , 2010, J. Classif..

[33]  Roel Popping,et al.  Some views on agreement to be used in content analysis studies , 2010 .

[34]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[35]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[36]  Roelof Popping Overeenstemmingsmaten voor nominale data , 1983 .

[37]  Christof Schuster,et al.  A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales , 2004 .

[38]  Adelin Albert,et al.  Agreement between Two Independent Groups of Raters , 2009 .

[39]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[40]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[41]  H. Brenner,et al.  Dependence of Weighted Kappa Coefficients on the Number of Categories , 1996, Epidemiology.

[42]  L. Hsu,et al.  Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α , 2003 .

[43]  M. Warrens On Similarity Coefficients for 2×2 Tables and Correction for Chance , 2008, Psychometrika.

[44]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[45]  Matthijs J. Warrens,et al.  k-Adic Similarity Coefficients for Binary (Presence/Absence) Data , 2009, J. Classif..

[46]  N D Holmquist,et al.  Variability in classification of carcinoma in situ of the uterine cervix. , 1967, Archives of pathology.