Coefficients of agreement for fixed observers

Agreement between fixed observers or methods that produce readings on a continuous scale is usually evaluated via one of several intraclass correlation coefficients (ICCs). This article presents and discusses a few related issues that have not been raised before. ICCs are usually presented in the context of a two-way analysis of variance (ANOVA) model. We argue that the ANOVA model makes inadequate assumptions, such as the homogeneity of the error variances and of the pairwise correlation coefficients between observers. We then present the concept of observer relational agreement which has been used in the social sciences to derive the common ICCs without making the restrictive ANOVA assumptions. This concept did not receive much attention in the biomedical literature. When observer agreement is defined in terms of the difference of the readings of different observers on the same subject (absolute agreement), the corresponding relational agreement coefficient coincides with the concordance correlation coefficient (CCC), which is also an ICC. The CCC, which has gained popularity over the past 15 years, compares the mean squared difference between readings of observers on the same subject with the expected value of this quantity under the assumption of ‘chance agreement’, which is defined as independence between observers. We argue that the assumption of independence is unrealistic in this context and present a new coefficient that is not based on the concept of chance agreement.

[1]  An ordinal coefficient of relational agreement for multiple judges , 1994 .

[2]  Huiman X. Barnhart,et al.  Observer Variability: A New Approach in Evaluating Interobserver Agreement , 2021, Journal of Data Science.

[3]  J. Bartko Corrective Note to: “The Intraclass Correlation Coefficient as a Measure of Reliability” , 1974 .

[4]  J. Berge,et al.  A family of association coefficients for metric scales , 1985 .

[5]  R. F. Fagot,et al.  A generalized family of coefficients of relational agreement for numerical scales , 1993 .

[6]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[7]  Frits E. Zegers A family of chance-corrected association coefficients for metric scales , 1986 .

[8]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[9]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[10]  R. Müller,et al.  A critical discussion of intraclass correlation coefficients. , 1994, Statistics in medicine.

[11]  K. Krippendorff Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[12]  R Schall,et al.  On population and individual bioequivalence. , 1993, Statistics in medicine.

[13]  Huiman X Barnhart,et al.  Overall Concordance Correlation Coefficient for Evaluating Agreement Among Multiple Observers , 2002, Biometrics.

[14]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[15]  F. E. Zegers Coefficients for Interrater Agreement , 1991 .

[16]  Huiman X Barnhart,et al.  Assessing intra, inter and total agreement with replicated readings , 2005, Statistics in medicine.

[17]  W. W. Stine Interobserver relational agreement , 1989 .

[18]  Henk Kelderman,et al.  Measurement exchangeability and normal one-factor models , 2004 .

[19]  M. Shoukri,et al.  Measures of Interobserver Agreement , 2003 .