A Generalization of Cohen's Kappa Agreement Measure to Interval Measurement and Multiple Raters

Cohen's kappa statistic is frequently used to measure agreement between two observers employing categorical polytomies. In this paper, Cohen's statistic is shown to be inherently multivariate in nature; it is expanded to analyze ordinal and interval data; and it is extended to more than two observers. A nonasymptotic test of significance is provided for the generalized statistic.

[1]  C. Spearman ‘FOOTRULE’ FOR MEASURING CORRELATION , 1906 .

[2]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[3]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[4]  M. Kendall Rank Correlation Methods , 1949 .

[5]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  P. Armitage,et al.  The Measurement of Observer Disagreement in the Recording of Signs , 1966 .

[8]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[9]  K. Krippendorff Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[10]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[11]  R. Light Measures of response agreement for qualitative data: Some generalizations and alternatives. , 1971 .

[12]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[13]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[14]  J J Bartko,et al.  ON THE METHODS AND THEORY OF RELIABILITY , 1976, The Journal of nervous and mental disease.

[15]  J. Bartko,et al.  On Various Intraclass Correlation Reliability Coefficients , 1976 .

[16]  G. W. Williams,et al.  Comparing the joint agreement of several raters with another rater. , 1976, Biometrics.

[17]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[18]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[19]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[20]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[21]  Kenneth J. Berry,et al.  Application of Multi-Response Permutation Procedures for Examining Seasonal Changes in Monthly Mean Sea-Level Pressure Patterns , 1981 .

[22]  H. Iyer,et al.  Permutation techniques for analyzing multi-response data from randomized block experiments , 1982 .

[23]  Rater agreement for complex assessments , 1983 .

[24]  Paul W. Mielke,et al.  34 Meteorological applications of permutation techniques based on distance functions , 1984, Nonparametric Methods.

[25]  Measures of Agreement for Incompletely Ranked Data , 1984 .

[26]  A. J. Conger Kappa Reliabilities for Continuous Behaviors and Events , 1985 .

[27]  Peter Tyrer,et al.  The Effect of Number of Rating Scale Categories on Levels of Interrater Reliability : A Monte Carlo Investigation , 1985 .

[28]  P. Mielke Non-metric statistical analyses: Some metric alternatives☆ , 1986 .

[29]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[30]  F9. L1, L2 and L∞ regression models: Is there a difference? , 1987 .