Intraclass correlation for two-by-two tables under three sampling designs.

Several sampling designs for assessing agreement between two binary classifications on each of n subjects lead to data arrayed in a four-fold table. Following Kraemer's (1979, Psychometrika 44, 461-472) approach, population models are described for binary data analogous to quantitative data models for a one-way random design, a two-way mixed design, and a two-way random design. For each of these models, parameters representing intraclass correlation are defined, and two estimators are proposed, one from constructing ANOVA-type tables for binary data, and one by the method of maximum likelihood. The maximum likelihood estimator of intraclass correlation for the two-way mixed design is the same as the phi coefficient (Chedzoy, 1985, in Encyclopedia of Statistical Sciences, Vol. 6, New York: Wiley). For moderately large samples, the ANOVA estimator for the two-way random design approximates Cohen's (1960, Psychological Measurement 20, 37-46) kappa statistic. Comparisons among the estimators indicate very little difference in values for tables with marginal symmetry. Differences among the estimators increase with increasing marginal asymmetry, and with average prevalence approaching .50.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  A. Robertson,et al.  The Heritability of All-or-None Traits: Viability of Poultry. , 1949, Genetics.

[3]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[4]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[5]  J. Fleiss Estimating the accuracy of dichotomous judgments , 1965, Psychometrika.

[6]  T. Mak Analysing Intraclass Correlation for Dichotomous Variables , 1988 .

[7]  C. Bodian,et al.  The epidemiology of gross cystic disease of the breast confirmed by biopsy or by aspiration of cyst fluid. , 1992, Cancer detection and prevention.

[8]  D. Cicchetti When diagnostic agreement is high, but reliability is low: some paradoxes occurring in joint independent neuropsychology assessments. , 1988, Journal of clinical and experimental neuropsychology.

[9]  J. Rosai,et al.  Borderline Epithelial Lesions of the Breast , 1991, The American journal of surgical pathology.

[10]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[11]  J. Fleiss Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.

[12]  C. Bodian,et al.  Some Limitations on Studies about the Relation between Gross Cystic Disease and Risk of Subsequent Breast Cancer a , 1990, Annals of the New York Academy of Sciences.

[13]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[14]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[15]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[16]  J S Uebersax,et al.  Latent class analysis of diagnostic agreement. , 1990, Statistics in medicine.

[17]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[18]  A. Feinstein,et al.  High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.

[19]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[20]  S D Walter,et al.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. , 1988, Journal of clinical epidemiology.

[21]  J. Jass,et al.  Observer study of the grading of dysplasia in ulcerative colitis: comparison with clinical outcome. , 1989, Human pathology.

[22]  G. Rae The Equivalence of Multiple Rater Kappa Statistics and Intraclass Correlation Coefficients , 1988 .