A variance-based measure of inter-rater agreement in medical databases

The increasing use of encoded medical data requires flexible tools for data quality assessment. Existing methods are not always adequate, and this paper proposes a new metric for inter-rater agreement of aggregated diagnostic data. The metric, which is applicable in prospective as well as retrospective coding studies, quantifies the variability in the coding scheme, and the variation can be differentiated in categories and in coders. Five alternative definitions were compared in a set of simulated coding situations and in the context of mortality statistics. Two of them were more effective, and the choice between them must be made according to the situation. The metric is more powerful for larger numbers of coded cases, and Type I errors are frequent when coding situations include different numbers of cases. We also show that it is difficult to interpret the meaning of variation when the structures of the compared coding schemes differ.

[1]  P. Diehr,et al.  Testing the null hypothesis in small area analysis. , 1992, Health Services Research.

[2]  J C Nelson,et al.  Statistical description of interrater variability in ordinal ratings , 2000, Statistical methods in medical research.

[3]  W. Loh,et al.  A comparison of tests of equality of variances , 1996 .

[4]  M. Rosén,et al.  National adaptations of the ICD rules for classification--a problem in the evaluation of cause-of-death trends. , 1997, Journal of clinical epidemiology.

[5]  Henri Theil,et al.  Statistical Decomposition Analysis: With Applications in the Social and Administrative Sciences , 1972 .

[6]  H Ahlfeldt,et al.  Evaluation of Three Swedish ICD-10 Primary Care Versions: Reliability and Ease of Use in Diagnostic Coding , 2000, Methods of Information in Medicine.

[7]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[8]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[9]  W K Fung,et al.  A simulation study comparing tests for the equality of coefficients of variation. , 1998, Statistics in medicine.

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  P Diehr,et al.  What is too much variation? The null hypothesis in small-area analysis. , 1990, Health services research.

[13]  P. Diehr,et al.  Small area variation analysis. Methods for comparing several diagnosis-related groups. , 1993, Medical care.

[14]  D. H. Freeman Statistical Decomposition Analysis , 1974 .

[15]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[16]  H. Theil On the Estimation of Relationships Involving Qualitative Variables , 1970, American Journal of Sociology.

[17]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[18]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[19]  A. Rossi Mori,et al.  Standards to Support Development of Terminological Systems for Healthcare Telematics , 1998, Methods of Information in Medicine.

[20]  M. E. Johnson,et al.  A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data , 1981 .