Inference Procedures for Assessing Interobserver Agreement among Multiple Raters

Summary. We propose a new procedure for constructing inferences about a measure of interobserver agreement in studies involving a binary outcome and multiple raters. The proposed procedure, based on a chi‐square goodness‐of‐fit test as applied to the correlated binomial model (Bahadur, 1961, in Studies in Item Analysis and Prediction, 158–176), is an extension of the goodness‐of‐fit procedure developed by Donner and Eliasziw (1992, Statistics in Medicine11, 1511–1519) for the case of two raters. The new procedure is shown to provide confidence‐interval coverage levels that are close to nominal over a wide range of parameter combinations. The procedure also provides a sample‐size formula that may be used to determine the required number of subjects and raters for such studies.

[1]  S. Lipsitz,et al.  Efficient Estimation of the Intraclass Correlation for a Binary Trait , 1996 .

[2]  D. Bowman,et al.  A full likelihood procedure for analysing exchangeable binary data. , 1995, Biometrics.

[3]  H. Kraemer,et al.  A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. , 1994, Statistics in medicine.

[4]  J. Fleiss,et al.  Interval estimation under two study designs for kappa with binary classifications. , 1993, Biometrics.

[5]  A Donner,et al.  A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. , 1992, Statistics in medicine.

[6]  H. Kraemer,et al.  How many raters? Toward the most reliable diagnostic consensus. , 1992, Statistics in medicine.

[7]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[8]  R. Prentice,et al.  Correlated binary regression with covariates specific to each binary observation. , 1988, Biometrics.

[9]  T. Mak Analysing Intraclass Correlation for Dichotomous Variables , 1988 .

[10]  Martin A. Tanner,et al.  Modeling Agreement among Raters , 1985 .

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  W. Conover Statistical Methods for Rates and Proportions , 1974 .

[13]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[14]  F. Yates,et al.  Statistical methods for research workers. 5th edition , 1935 .