Percent Agreement, Pearson's Correlation, and Kappa as Measures of Inter-examiner Reliability

Percent agreement and Pearson's correlation coefficient are frequently used to represent inter-examiner reliability, but these measures can be misleading. The use of percent agreement to measure inter-examiner agreement should be discouraged, because it does not take into account the agreement due solely to chance. Caution must be used in the interpretation of Pearson's correlation, because it is unaffected by the presence of any systematic biases. Analyses of data from a reliability study show that even though percent agreement and kappa were consistently high among three examiners, the reliability measured by Pearson's correlation was inconsistent. This study shows that correlation and kappa can be used together to uncover non-random examiner error.

[1]  H. Chauncey,et al.  Interexaminer agreement in the measurement of periodontal disease. , 1982, Journal of periodontal research.

[2]  J. Fleiss,et al.  The measurement of interexaminer agreement on periodontal disease. , 1983, Journal of periodontal research.

[3]  D. Leverett,et al.  Weekly rinsing with a fluoride mouthrinse in an unfluoridated community: results after seven years. , 1985, Journal of public health dentistry.

[4]  J. Bartko,et al.  On Various Intraclass Correlation Reliability Coefficients , 1976 .

[5]  J. Egelberg,et al.  Reproducibility of probing attachment level measurements. , 1984, Journal of clinical periodontology.

[6]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[7]  J. Goultschin,et al.  Assessment of inter-examiner agreement in scoring periodontal disease. , 1985, Journal of periodontal research.

[8]  James Joseph Biundo,et al.  Analysis of Contingency Tables , 1969 .

[9]  J. Boffa,et al.  Reliability of coding depth of approximal carious lesions from non-independent interpretation of serial bitewing radiographs. , 1984, Community dentistry and oral epidemiology.

[10]  U. Velden Influence of probing force on the reproducibility of bleeding tendency measurements. , 1980 .

[11]  U. Velden,et al.  The influence of probing force on the reproducibility of pocket depth measurements , 1980 .

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  N W Chilton,et al.  Inter-examiner Reliability in Caries Trials , 1979, Journal of dental research.

[14]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[15]  P E Shrout,et al.  The effects of measurement errors on some multivariate procedures. , 1977, American journal of public health.

[16]  J. Barbano,et al.  Reproducibility of periodontal scores in clinical trials. , 1974, Journal of periodontal research. Supplement.

[17]  J. Fleiss,et al.  Inter- and intra-examiner variability in scoring supragingival plaque: II. Statistical analysis. , 1980, Pharmacology and therapeutics in dentistry.

[18]  M. Vigild Prevalence of malocclusion in mentally retarded young adults. , 1985, Community dentistry and oral epidemiology.