How many raters? Toward the most reliable diagnostic consensus.

When faced with a decision whether or not to treat a patient, to enter or to withdraw a patient from a clinical trial, or any other such binary decision, based on diagnosis with unsatisfactory reliability, can a consensus diagnosis be used to improve reliability? If so, exactly how? That is the question I address here. I draw comparisons and contrasts between the known results with an interval consensus and those with a binary consensus and suggest tactics for use in a pilot study to answer the above questions.

[1]  K. Kaye,et al.  Estimating False Alarms and Missed Events From Interobserver Agreement: A Rationale , 1980 .

[2]  W. Brown SOME EXPERIMENTAL RESULTS IN THE CORRELATION OF MENTAL ABILITIES1 , 1910 .

[3]  C. Spearman CORRELATION CALCULATED FROM FAULTY DATA , 1910 .

[4]  D. Levy,et al.  Mixture distributions in psychiatric research. , 1984, Biological psychiatry.

[5]  M P Becker,et al.  Using association models to analyse agreement data: two examples. , 1989, Statistics in medicine.

[6]  L. Koran,et al.  The reliability of clinical methods, data and judgments (second of two parts). , 1975, The New England journal of medicine.

[7]  N D Holmquist,et al.  Variability in classification of carcinoma in situ of the uterine cervix. , 1967, Archives of pathology.

[8]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[9]  J. Darroch,et al.  Category Distinguishability and Observer Agreement , 1986 .

[10]  Helena C. Kraemer,et al.  Assessment of 2 × 2 Associations: Generalization of Signal-Detection Methodology , 1988 .

[11]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  Helena C. Kraemer,et al.  Estimating false alarms and missed events from interobserver agreement: Comment on Kaye. , 1982 .

[14]  M. Tanner,et al.  Modeling ordinal scale disagreement. , 1985, Psychological bulletin.

[15]  W. G. Cochran Errors of Measurement in Statistics , 1968 .

[16]  J. Bartko,et al.  On Various Intraclass Correlation Reliability Coefficients , 1976 .

[17]  H. Kraemer,et al.  Kappa coefficients in epidemiology: an appraisal of a reappraisal. , 1988, Journal of clinical epidemiology.

[18]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[19]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .