Testing the equality of two dependent kappa statistics.

Procedures are developed and compared for testing the equality of two dependent kappa statistics in the case of two raters and a dichotomous outcome variable. Such problems may arise when each of a sample of subjects are rated under two distinct settings, and it is of interest to compare the observed levels of inter-observer and intra-observer agreement. The procedures compared are extensions of previously developed procedures for comparing kappa statistics computed from independent samples. The results of a Monte Carlo simulation show that adjusting for the dependency between samples tends to be worthwhile only if the between-setting correlation is comparable in magnitude to the within-setting correlations. In this case, a goodness-of-fit procedure that takes into account the dependency between samples is recommended.

[1]  A Donner,et al.  Sample size requirements for the comparison of two or more coefficients of inter-observer agreement. , 1998, Statistics in medicine.

[2]  A K Manatunga,et al.  Assessing interrater agreement from dependent data. , 1997, Biometrics.

[3]  D M Clarke,et al.  Comparing correlated kappas by resampling: is one level of agreement significantly different from another? , 1996, Journal of psychiatric research.

[4]  M. Eliasziw,et al.  Testing the homogeneity of kappa statistics. , 1996, Biometrics.

[5]  A. Mackinnon,et al.  Problem of diagnosis in postmortem brain studies of schizophrenia. , 1996, The American journal of psychiatry.

[6]  M. Schervish Theory of Statistics , 1995 .

[7]  L. S. Feldt,et al.  Testing the Equality of Two Related Intraclass Reliability Coefficients , 1994 .

[8]  H J Schouten Estimating kappa from binocular data and comparing marginal probabilities. , 1993, Statistics in medicine.

[9]  J. Fleiss,et al.  Interval estimation under two study designs for kappa with binary classifications. , 1993, Biometrics.

[10]  Anthony F Jorm,et al.  Control-informant agreement on exposure history in case-control studies of Alzheimer's disease. , 1992, International journal of epidemiology.

[11]  B. Rosner Multivariate methods for clustered binary data with multiple subclasses, with application to binary longitudinal data. , 1992, Biometrics.

[12]  A Donner,et al.  A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. , 1992, Statistics in medicine.

[13]  S G Baker,et al.  Using replicate observations in observer agreement studies with binary assessments. , 1991, Biometrics.

[14]  N L Oden Estimating kappa from binocular data. , 1991, Statistics in medicine.

[15]  Methodological implications of interaural correlation: Count heads not ears , 1990, Perception & psychophysics.

[16]  G. P. Browman,et al.  Assessment of observer variation in measuring the Radiographic Vertebral Index in patients with multiple myeloma. , 1990, Journal of clinical epidemiology.

[17]  Bernard Rosner,et al.  Multivariate Methods for Clustered Binary Data with More than One Level of Nesting , 1989 .

[18]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[19]  R. Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[20]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[21]  R. Murphy,et al.  Observer agreement, chest auscultation, and crackles in asbestos-exposed workers. , 1986, Chest.

[22]  A. Donner,et al.  Testing the effect of sex differences on sib-sib correlations. , 1984, Biometrics.

[23]  B Rosner,et al.  Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. , 1982, Biometrics.

[24]  H. Kraemer Extension of Feldt's approach to testing homogeneity of coefficients of reliability , 1981 .

[25]  Leonard S. Feldt,et al.  A test of the hypothesis that Cronbach's alpha reliability coefficient is the same for two tests administered to the same sample , 1980 .

[26]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[27]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[28]  H. Kraemer,et al.  Statistical alternatives in assessing reliability, consistency, and individual differences for quantitative measures: Application to behavioral measures of neonates. , 1976 .

[29]  H. Kraemer On estimation and hypothesis testing problems for correlation coefficients , 1975 .

[30]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[31]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[32]  A. Stuart A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION , 1955 .

[33]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .