Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets

The authors propose a new generalized method for ROC-curve fitting and statistical testing that allows researchers to utilize all of the data collected in an experimental comparison of two diagnostic modalities, even if some patients have not been studied with both modalities. Their new algorithm, ROCKIT, subsumes previous algorithms as special cases. It conducts all analyses available from previous ROC software and provides 95% confidence intervals for all estimates. ROCKIT was tested on more than half a million computer-simulated datasets of various sizes and configurations repre senting a range of population ROC curves. The algorithm successfully converged for more than 99.8% of all datasets studied. The type I error rates of the new algorithm's statistical test for differences in Az estimates were excellent for datasets typically en countered in practice, but diverged from alpha for datasets arising from some extreme situations. Key words. receiver operating characteristic (ROC) analysis, maximum-like lihood estimation; partially-paired data; missing data. (Med Decis Making 1998;18: 110-121)

[1]  Joseph H. Tashjian Proceedings of the Chest Imaging Conference 1987 , 1989 .

[2]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[3]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[4]  C E Metz,et al.  Quantification of failure to demonstrate statistical significance. The usefulness of confidence intervals. , 1993, Investigative radiology.

[5]  J A Swets,et al.  Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. , 1986, Psychological bulletin.

[6]  D. C. Barber,et al.  Medical Imaging-The Assessment of Image Quality , 1996 .

[7]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[8]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[9]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[10]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[11]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[12]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[13]  C. Metz,et al.  Statistical significance tests for binormal ROC curves , 1980 .

[14]  J. Hanley The Robustness of the "Binormal" Assumptions Used in Fitting ROC Curves , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[15]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.