Asymptotic variability of (multilevel) multirater kappa coefficients

Agreement studies are of paramount importance in various scientific domains. When several observers classify objects on categorical scales, agreement can be quantified through multirater kappa coefficients. In most statistical packages, the standard error of these coefficients is only available under the null hypothesis that the coefficient is equal to zero, preventing the construction of confidence intervals in the general case. The aim of this paper is triple. First, simple analytic formulae for the standard error of multirater kappa coefficients will be given in the general case. Second, these formulae will be extended to the case of multilevel data structures. The formulae are based on simple matrix algebra and are implemented in the R package “multiagree”. Third, guidelines on the choice between the different mulitrater kappa coefficients will be provided.

[1]  Janis E. Johnston,et al.  Resampling Probability Values for Weighted Kappa with Multiple Raters , 2008, Psychological reports.

[2]  Zhao Yang,et al.  Weighted kappa statistic for clustered matched-pair ordinal data , 2015, Comput. Stat. Data Anal..

[3]  D M Clarke,et al.  Comparing correlated kappas by resampling: is one level of agreement significantly different from another? , 1996, Journal of psychiatric research.

[4]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  N L Oden Estimating kappa from binocular data. , 1991, Statistics in medicine.

[7]  Annette J. Dobson,et al.  General observer-agreement measures on individual subjects and groups of subjects , 1984 .

[8]  H. Schouten,et al.  Measuring pairwise agreement among many observers , 1980 .

[9]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[10]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[11]  A. Scott,et al.  A simple method for the analysis of clustered binary data. , 1992, Biometrics.

[12]  S. Vanbelle Comparing dependent kappa coefficients obtained on multilevel data , 2017, Biometrical journal. Biometrische Zeitschrift.

[13]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[14]  W. Barlow,et al.  A comparison of methods for calculating a stratified kappa. , 1990, Statistics in medicine.

[15]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[16]  Jianwen Cai,et al.  Kappa statistic for clustered dichotomous responses from physicians and patients , 2013, Statistics in medicine.

[17]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[18]  J. Richard Landis,et al.  Large sample variance of kappa in the case of different sets of raters. , 1979 .

[19]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[20]  H. Pasterkamp,et al.  International perception of lung sounds: a comparison of classification across some European borders , 2017, BMJ Open Respiratory Research.

[21]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[22]  N A Obuchowski,et al.  On the comparison of correlated proportions for clustered data. , 1998, Statistics in medicine.

[23]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[24]  H. Schouten,et al.  Measuring pairwise interobserver agreement when all subjects are judged by the same observers , 1982 .

[25]  Ming Zhou,et al.  Kappa statistic for clustered matched‐pair data , 2014, Statistics in medicine.

[26]  M. Piedmonte,et al.  A Method for Generating High-Dimensional Multivariate Binary Variates , 1991 .

[27]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[28]  H. Schouten,et al.  Measuring Pairwise Agreement Among Many Observers. II. Some Improvements and Additions , 1982 .