Agreement between Two Independent Groups of Raters

We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen’s kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the coefficient are also given and their sampling variance is determined by the Jackknife method. The method is illustrated on medical education data which motivated the research.

[1]  H. Schouten,et al.  Measuring pairwise interobserver agreement when all subjects are judged by the same observers , 1982 .

[2]  Myles Hollander,et al.  Testing for agreement between two groups of judges , 1978 .

[3]  A. Albert,et al.  Test de concordance de script : un nouveau mode d'établissement des scores limitant l'effet du hasard , 2007 .

[4]  A K Manatunga,et al.  Modeling kappa for measuring dependent categorical agreement data. , 2000, Biostatistics.

[5]  H. Kraemer,et al.  Agreement Statistics: Kappa Coefficients in Medical Research , 2005 .

[6]  Huiman X Barnhart,et al.  Weighted Least‐Squares Approach for Comparing Correlated Kappa , 2002, Biometrics.

[7]  Helena C. Kraemer,et al.  Intergroup concordance: Definition and estimation , 1981 .

[8]  N. Black,et al.  An experimental study of determinants of group judgments in clinical guideline development , 2004, The Lancet.

[9]  B. Charlin,et al.  Le test de concordance de script, un instrument d’évaluation du raisonnement clinique , 2002 .

[10]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[13]  C. Kreiter,et al.  The Psychometric Properties of Five Scoring Methods Applied to the Script Concordance Test , 2005, Academic medicine : journal of the Association of American Medical Colleges.

[14]  Paul D. Feigin,et al.  Intergroup Diversity and Concordance for Ranking Data: An Approach via Metrics for Permutations , 1986 .

[15]  S. Lipsitz,et al.  A simple method for estimating a regression model for κ between a pair of raters , 2001 .

[16]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[17]  W. R. Schucany,et al.  A rank test for two group concordance , 1973 .

[18]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[19]  Adelin Albert,et al.  Agreement between an isolated rater and a group of raters , 2009 .

[20]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[21]  P. van Beukelen,et al.  Developing a classification tool based on Bloom's taxonomy to assess the cognitive level of short essay questions. , 2004, Journal of veterinary medical education.

[22]  J. Fleiss Statistical methods for rates and proportions , 1974 .