Assessing Rater Performance without a "Gold Standard" Using Consensus Theory

This study illustrates the use of consensus theory to assess the diagnostic perform ances of raters and to estimate case diagnoses in the absence of a criterion or "gold" standard. A description is provided of how consensus theory "pools" information pro vided by raters, estimating rater competencies and differentially weighting their re sponses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model esti mates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased. Key words: clinical competence; interobserver variation; diagnostic evalu ation ; models—mathematical; consensus theory. (Med Decis Making 1997;17:71- 79)

[1]  W. Brown SOME EXPERIMENTAL RESULTS IN THE CORRELATION OF MENTAL ABILITIES1 , 1910 .

[2]  William H. Batchelder,et al.  New Results in Test Theory Without an Answer Key , 1989 .

[3]  R. Swensson,et al.  Improving diagnostic accuracy: a comparison of interactive and Delphi consultations. , 1977, Investigative radiology.

[4]  W. Dixon,et al.  BMDP statistical software , 1983 .

[5]  N. Kissoon,et al.  Evaluation of the role of comparison radiographs in the diagnosis of traumatic elbow injuries. , 1995, Journal of pediatric orthopedics.

[6]  M Staquet,et al.  Methodology for the assessment of new dichotomous diagnostic tests. , 1981, Journal of chronic diseases.

[7]  R. Smith,et al.  Where is the wisdom...? , 1991, BMJ.

[8]  R. Fateman,et al.  A System for Doing Mathematics by Computer. , 1992 .

[9]  C. Spearman CORRELATION CALCULATED FROM FAULTY DATA , 1910 .

[10]  C E Metz,et al.  Gains in Accuracy from Replicated Readings of Diagnostic Images , 1992, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  W. Batchelder,et al.  Test theory without an answer key , 1988 .

[12]  N. Kissoon,et al.  Use of comparison radiographs in the diagnosis of traumatic injuries of the elbow. , 1992, Annals of emergency medicine.

[13]  H. Kraemer,et al.  How many raters? Toward the most reliable diagnostic consensus. , 1992, Statistics in medicine.

[14]  A V Milholland,et al.  Medical assessment by a Delphi group opinion technic. , 1973, The New England journal of medicine.

[15]  M. Weinstein,et al.  Clinical Decision Analysis , 1980 .

[16]  S. Weller Shared Knowledge, Intracultural Variation, and Knowledge Aggregation , 1987 .

[17]  P N Valenstein,et al.  Evaluating diagnostic tests with imperfect standards. , 1990, American journal of clinical pathology.

[18]  M A Young,et al.  Evaluating diagnostic criteria: a latent class paradigm. , 1982, Journal of psychiatric research.

[19]  S D Walter,et al.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. , 1988, Journal of clinical epidemiology.

[20]  B. Grofman Information Pooling and Group Decision Making , 1986 .

[21]  J S Uebersax,et al.  Latent class analysis of diagnostic agreement. , 1990, Statistics in medicine.

[22]  W. Batchelder,et al.  Recent Applications of Cultural Consensus Theory , 1987 .

[23]  W. Batchelder,et al.  Culture as Consensus: A Theory of Culture and Informant Accuracy , 1986 .

[24]  A. Kimball Romney,et al.  Systematic Data Collection , 1988 .

[25]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[26]  Stephen Wolfram,et al.  Mathematica: a system for doing mathematics by computer (2nd ed.) , 1991 .

[27]  R G Swensson,et al.  Improving performance by multiple interpretations of chest radiographs: effectiveness and cost. , 1978, Radiology.

[28]  A Agresti,et al.  Modelling patterns of agreement and disagreement , 1992, Statistical methods in medical research.