On the Marginal Dependency of Cohen’s κ

Cohen’s κ (kappa) is typically used as a measure of degree of rater agreement. It is often criticized because it is marginal-dependent. In this article, this characteristic is explained and illustrated in the context of (1) nonuniform marginal probability distributions, (2) odds ratios that remain constant while κ changes in the presence of varying marginal distributions, and (3) percentages of raw agreement that remain constant while κ changes in the presence of varying marginal distributions. The meaning and interpretation of κ are explained with reference to the log-linear main effect model of variable independence. This model is used for the estimation of the expected cell frequencies of agreement tables. It is shown that the interpretation of κ as a measure of degree of agreement is incorrect. The correct interpretation is that κ assesses the degree of agreement beyond that expected based on a statistical model such as the independence or the null model. Based on Goodman’s (1991) distinction between ...

[1]  P. Mair,et al.  Significance Tests for the Measure of Raw Agreement , 2006 .

[2]  Rater Agreement – Kappa , 2005 .

[3]  Alexander von Eye,et al.  Configural Frequency Analysis , 2005 .

[4]  Werner Vach,et al.  The dependence of Cohen's kappa on the prevalence does not matter. , 2005, Journal of clinical epidemiology.

[5]  Editors-in-chief,et al.  Encyclopedia of statistics in behavioral science , 2005 .

[6]  A. Eye,et al.  Can One Use Cohen’s Kappa to Examine Disagreement? , 2005 .

[7]  Characteristics of Measuresfor 2 × 2 Tables , 2003 .

[8]  J. Bortz,et al.  Kurzgefaßte Statistik für die klinische Forschung , 1998 .

[9]  V. Farewell,et al.  Conditional inference for subject-specific and marginal agreement: Two families of agreement measures† , 1995 .

[10]  Irene Guggenmoss-Holzmann Modelling covariate effects in observer agreement studies: The case of nominal scale agreement , 1995 .

[11]  Alexander von Eye,et al.  Concepts of nonindependence in Configural Frequency Analysis , 1995 .

[12]  Leo A. Goodman,et al.  Measures, Models, and Graphical Displays in the Analysis of Cross-Classified Data , 1991 .

[13]  Models of Chance when Measuring Interrater Agreement with Kappa , 1991 .

[14]  T. Wickens Multiway Contingency Tables Analysis for the Social Sciences , 1989 .

[15]  M. Llabre,et al.  The Equivalence of Kappa and Del , 1985 .

[16]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[17]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[18]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[19]  J. Fleiss Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.

[20]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[21]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .