论文信息 - Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

Background Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

[1] K. Berbaum,et al. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[2] H. Hussain,et al. T2-weighted MR imaging in the assessment of cirrhotic liver. , 2004, Radiology.

[3] Niall M. Adams,et al. An improved measure for comparing diagnostic tests , 2000, Comput. Biol. Medicine.

[4] Hans Roehrig,et al. Using a human visual system model to optimize soft-copy mammography display: influence of veiling glare. , 2006, Academic radiology.

[5] V. Gupta,et al. The mathematical structure of rainfall representations: 1. A review of the stochastic rainfall models , 1981 .

[6] M. Pencina,et al. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[7] J D Habbema,et al. Application of Treatment Thresholds to Diagnostic-test Evaluation , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[8] J. Ware,et al. Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[9] C E Metz,et al. Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[10] R. Schmidt,et al. Comparison of independent double readings and computer-aided diagnosis (CAD) for the diagnosis of breast calcifications. , 2006, Academic radiology.

[11] David Gur,et al. "Binary" and "non-binary" detection tasks: are current performance measures optimal? , 2007, Academic radiology.

[12] David Gur,et al. Comparing areas under receiver operating characteristic curves: potential impact of the "Last" experimentally measured operating point. , 2008, Radiology.

[13] C. von Wagner,et al. Patients' & Healthcare Professionals' Values Regarding True- & False-Positive Diagnosis when Colorectal Cancer Screening by CT Colonography: Discrete Choice Experiment , 2013, PloS one.

[14] J. Hilden. The Area under the ROC Curve and Its Competitors , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[15] A. Agresti. Modelling ordered categorical data: recent advances and future challenges. , 1999, Statistics in medicine.

[16] J M Lewin,et al. Comparison of full-field digital mammography with screen-film mammography for cancer detection: results of 4,945 paired examinations. , 2001, Radiology.

[17] C. Metz. ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[18] Nancy A Obuchowski,et al. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[19] C E Metz,et al. The "proper" binormal model: parametric receiver operating characteristic curve estimation with degenerate data. , 1997, Academic radiology.

[20] N. Petrick,et al. CT colonography with computer-aided detection as a second reader: observer performance study. , 2008, Radiology.

[21] David Gur,et al. A permutation test for comparing ROC curves in multireader studies a multi-reader ROC, permutation test. , 2006, Academic radiology.

[22] Elena B. Elkin,et al. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers , 2008, BMC Medical Informatics Decis. Mak..

[23] Benjamin M Yeh,et al. Peripheral zone prostate cancer: accuracy of different interpretative approaches with MR and MR spectroscopic imaging. , 2008, Radiology.

[24] Kevin S. Berbaum,et al. A contaminated binormal model for ROC data , 2000 .

[25] Michael B. Harrington. Some methodological questions concerning receiver operating characteristic (ROC) analysis as a method for assessing image quality in radiology , 2009, Journal of Digital Imaging.

[26] N. Obuchowski,et al. Computer-aided detection of colorectal polyps: can it improve sensitivity of less-experienced readers? Preliminary findings. , 2007, Radiology.