An ROC‐type measure of diagnostic accuracy when the gold standard is continuous‐scale

ROC curves and summary measures of accuracy derived from them, such as the area under the ROC curve, have become the standard for describing and comparing the accuracy of diagnostic tests. Methods for estimating ROC curves rely on the existence of a gold standard which dichotomizes patients into disease present or absent. There are, however, many examples of diagnostic tests whose gold standards are not binary-scale, but rather continuous-scale. Unnatural dichotomization of these gold standards leads to bias and inconsistency in estimates of diagnostic accuracy. In this paper, we propose a non-parametric estimator of diagnostic test accuracy which does not require dichotomization of the gold standard. This estimator has an interpretation analogous to the area under the ROC curve. We propose a confidence interval for test accuracy and a statistical test for comparing accuracies of tests from paired designs. We compare the performance (i.e. CI coverage, type I error rate, power) of the proposed methods with several alternatives. An example is presented where the accuracies of two quick blood tests for measuring serum iron concentrations are estimated and compared.

[1]  Laura Antolini,et al.  Inference on Correlated Discrimination Measures in Survival Analysis: A Nonparametric Approach , 2004 .

[2]  Michael W Kattan,et al.  Evaluating a New Marker’s Predictive Contribution , 2004, Clinical Cancer Research.

[3]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[4]  J. Fleiss The design and analysis of clinical experiments , 1987 .

[5]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[6]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[7]  Roy T. St. Laurent,et al.  Evaluating agreement with a gold standard in method comparison studies. , 1998 .

[8]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[9]  J. Yellott Constant Volume Operators and Lateral Inhibition , 1989 .

[10]  D. Mossman Three-way ROCs , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  N A Obuchowski,et al.  Nonparametric analysis of clustered ROC curve data. , 1997, Biometrics.

[12]  H. T. Tillotson,et al.  The Problem of Conversion in Method Comparison Studies , 1991 .

[13]  D. Quade,et al.  On Comparing the Correlations within Two Pairs of Variables , 1968 .

[14]  N A Obuchowski,et al.  Assessing physicians' accuracy in diagnosing paediatric patients with acute abdominal pain: measuring accuracy for multiple diseases , 2001, Statistics in medicine.

[15]  Receiver operating characteristic (ROC) analysis for diagnostic examinations with uninterpretable cases , 2002, Statistics in medicine.

[16]  A. Donner A Review of Inference Procedures for the Intraclass Correlation Coefficient in the One-Way Random Effects Model , 1986 .

[17]  M. Pencina,et al.  Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation , 2004, Statistics in medicine.

[18]  R. Swensson,et al.  Analysis of rating data from multiple-alternative tasks☆ , 1989 .

[19]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[20]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[21]  N A Obuchowski,et al.  Confidence intervals for the receiver operating characteristic area in studies with small samples. , 1998, Academic radiology.

[22]  X H Zhou,et al.  A simple method for comparing correlated ROC curves using incomplete data. , 1996, Statistics in medicine.

[23]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[24]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[25]  S Kumanyika,et al.  A weighted concordance correlation coefficient for repeated measurement designs. , 1996, Biometrics.

[26]  F. Wians,et al.  Discriminating between iron deficiency anemia and anemia of chronic disease using traditional indices of iron status vs transferrin receptor concentration. , 2001, American journal of clinical pathology.