Measures of Agreement and Concordance With Clinical Research Applications

This article reviews measures of interrater agreement, including the complementary roles of tests for interrater bias and estimates of kappa statistics and intraclass correlation coefficients (ICCs), following the developments outlined by Landis and Koch (1977a; 1977b; 1977c). Category-specific measures of reliability, together with pairwise measures of disagreement among categories, are extended to accommodate multistage research designs involving unbalanced data. The covariance structure of these category-specific agreement and pairwise disagreement coefficients is summarized for use in modeling and hypothesis testing. These agreement/disagreement measures of intraclass/interclass correlation are then estimated within specialized software and illustrated for several clinical research applications. Further consideration is also given to measures of agreement for continuous data, namely the concordance correlation coefficient (CCC) developed originally by Lin (1989). An extension to this CCC was published by King and Chinchilli (2001b), yielding a generalized concordance correlation coefficient which is appropriate for both continuous and categorical data. This coefficient is reviewed and its use illustrated with clinical research data. Additional extensions to this CCC methodology for longitudinal studies are also summarized.

[1]  Sean D Sullivan,et al.  Daily versus as-needed corticosteroids for mild persistent asthma. , 2005, The New England journal of medicine.

[2]  Bret Larget,et al.  Analysis of Categorical Data , 2002 .

[3]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[4]  Dana Quade,et al.  Nonparametric Partial Correlation , 1967 .

[5]  D. Quade,et al.  On Comparing the Correlations within Two Pairs of Variables , 1968 .

[6]  Vernon M Chinchilli,et al.  A repeated measures concordance correlation coefficient , 2007, Statistics in medicine.

[7]  B. Everitt,et al.  COMPARING THE MARGINAL TOTALS OF SQUARE CONTINGENCY TABLES , 1971 .

[8]  J. Richard Landis,et al.  A Note on the Equivalence of Several Marginal Homogeneity Test Criteria for Categorical Data , 1982 .

[9]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[10]  E. Vonesh,et al.  Goodness-of-fit in generalized nonlinear mixed-effects models. , 1996, Biometrics.

[11]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[12]  G G Koch,et al.  A general methodology for the analysis of experiments with repeated measurement of categorical data. , 1977, Biometrics.

[13]  J. Fleiss,et al.  The equivalence of the generalized McNemar tests for marginal homogeneity in 2(3) and 3(2) tables. , 1975, Biometrics.

[14]  L. Kurland,et al.  Studies on multiple sclerosis in Winnepeg, Manitoba, and New Orleans, Louisiana. I. Prevalence; comparison between the patient groups in Winnipeg and New Orleans. , 1953, American journal of hygiene.

[15]  L. Lin Assay Validation Using the Concordance Correlation Coefficient , 1992 .

[16]  Gary G. Koch,et al.  Analysis of categorical data , 1985 .

[17]  A. Madansky TESTS OF HOMOGENEITY FOR CORRELATED SAMPLES , 1963 .

[18]  D. Gaylor,et al.  Variance Component Testing in Unbalanced Nested Designs , 1974 .

[19]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[20]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[21]  S Kumanyika,et al.  A weighted concordance correlation coefficient for repeated measurement designs. , 1996, Biometrics.

[22]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[23]  J. R. Landis,et al.  A general overview of Mantel-Haenszel methods: applications and recent developments. , 1988, Annual review of public health.

[24]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[25]  M. W. Birch The Detection of Partial Association, Ii: The General Case , 1965 .

[26]  Á. M. Fidalgo Mantel–Haenszel Methods , 2005 .

[27]  Lluís Jover,et al.  Estimating the Generalized Concordance Correlation Coefficient through Variance Components , 2003, Biometrics.

[28]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[29]  Michael Friendly,et al.  Visualizing Categorical Data , 2009, Encyclopedia of Database Systems.

[30]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[31]  Gary G. Koch,et al.  Analysis of Rank Measures of Association for Ordinal Data from Longitudinal Studies , 1989 .

[32]  T. Allison,et al.  A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings , 1971 .

[33]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[34]  J. Darroch,et al.  The Mantel-Haenszel Test and Tests of Marginal Symmetry; Fixed-Effects and Mixed Models for a Categorical Response, Correspondent Paper , 1981 .

[35]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[36]  G G Koch,et al.  Some general methods for the analysis of categorical data in longitudinal studies. , 1988, Statistics in medicine.

[37]  N. Mantel,et al.  Marginal homogeneity, symmetry, and independence , 1978 .

[38]  J. Rao On Expectations, Variances, and Covariances of ANOVA Mean Squares by 'Synthesis' , 1968 .

[39]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[40]  M. W. Birch The Detection of Partial Association, I: The 2 × 2 Case , 1964 .

[41]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[42]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[43]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[44]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[45]  Hung-Mo Lin,et al.  Resampling Dependent Concordance Correlation Coefficients , 2007, Journal of biopharmaceutical statistics.

[46]  Brian Everitt,et al.  MOMENTS OF THE STATISTICS KAPPA AND WEIGHTED KAPPA , 1968 .

[47]  Hung-Mo Lin,et al.  Computer programs for the concordance correlation coefficient , 2007, Comput. Methods Programs Biomed..

[48]  A. S. Hedayat,et al.  A Unified Approach for Assessing Agreement for Continuous and Categorical Data , 2007, Journal of biopharmaceutical statistics.

[49]  John J. Gart,et al.  THE COMPARISON OF PROPORTIONS: A REVIEW OF SIGNIFICANCE TESTS, CONFIDENCE INTERVALS AND ADJUSTMENTS FOR STRATIFICATION' , 1971 .

[50]  Huiman X Barnhart,et al.  Overall Concordance Correlation Coefficient for Evaluating Agreement Among Multiple Observers , 2002, Biometrics.

[51]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[52]  J. Richard Landis,et al.  Large sample variance of kappa in the case of different sets of raters. , 1979 .

[53]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[54]  V M Chinchilli,et al.  A generalized concordance correlation coefficient for continuous and categorical data , 2001, Statistics in medicine.

[55]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[56]  Tonya S. King,et al.  ROBUST ESTIMATORS OF THE CONCORDANCE CORRELATION COEFFICIENT , 2001, Journal of biopharmaceutical statistics.

[57]  N D Holmquist,et al.  Variability in classification of carcinoma in situ of the uterine cervix. , 1967, Archives of pathology.

[58]  John P. Buonaccorsi,et al.  Fieller's Theorem , 2005 .

[59]  Y. Bishop The Analysis of Categorical Data (2nd Ed.) , 1983 .

[60]  H. Barnhart,et al.  Modeling Concordance Correlation via GEE to Evaluate Reproducibility , 2001, Biometrics.