Estimating Diagnostic Test Accuracy Using a "Fuzzy Gold Standard"

This study uses Monte Carlo methods to analyze the consequences of having a criterion standard ("gold standard") that contains some error when analyzing the accuracy of a diagnostic test using ROC curves. Two phenomena emerge: 1) When diagnostic test errors are statistically independent from inaccurate ("fuzzy") gold standard (FGS) errors, estimated test accuracy declines. 2) When the test and the FGS have statistically dependent errors, test accuracy can become overstated. Two methods are proposed to eliminate the first of these errors, exploring the risk of exacerbating the second. Both require a probabilistic (rather than binary) gold-standard statement (e.g., probability that each case is abnormal). The more promising of these, the "two-truth" method, selectively eliminates those cases where the gold standard is most ambiguous (probability near 0.5). When diagnostic test and FGS errors are independent, this approach can eliminate much of the downward bias caused by FGS error, without meaningful risk of overstating test accuracy. When the test and FGS have dependent errors, the resultant upward bias can cause test accuracy to be overstated, in the most extreme cases, even before the offsetting "two-truth" approach is employed. Key words: ROC curves; diagnostic test accuracy; technology assessment. (Med Decis Making 1995;15:44-57)

[1]  S. Walter,et al.  Estimating the error rates of diagnostic tests. , 1980, Biometrics.

[2]  A. Detsky,et al.  The Accuracy of Magnetic Resonance Imaging in Patients With Suspected Multiple Sclerosis , 1993 .

[3]  C B Begg,et al.  Consensus Diagnoses and "Gold Standards" , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  S D Walter,et al.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. , 1988, Journal of clinical epidemiology.

[5]  W Zucchini,et al.  On the statistical analysis of ROC curves. , 1989, Statistics in medicine.

[6]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[7]  R A Greenes,et al.  The influence of uninterpretability on the assessment of diagnostic tests. , 1986, Journal of chronic diseases.

[8]  M Staquet,et al.  Methodology for the assessment of new dichotomous diagnostic tests. , 1981, Journal of chronic diseases.

[9]  A. Feinstein,et al.  Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. , 1978, The New England journal of medicine.

[10]  L. A. Thibodeaul Evaluating Diagnostic Tests , 1981 .

[11]  P Deneef,et al.  Evaluating Rapid Tests for Streptococcal Pharyngitis , 1987, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[13]  M. Bronskill,et al.  Receiver Operator characteristic (ROC) Analysis without Truth , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  R A Greenes,et al.  Assessment of Diagnostic Technologies: Methodology for Unbiased Estimation from Samples of Selectively Verified Patients , 1985, Investigative radiology.

[15]  Byron J. T. Morgan,et al.  Some aspects of ROC curve-fitting: Normal and logistic models , 1972 .

[16]  P M Vacek,et al.  The effect of conditional dependence on the evaluation of diagnostic tests. , 1985, Biometrics.

[17]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .