Statistical properties of a utility measure of observer performance compared to area under the ROC curve

The receiver operating characteristic (ROC) curve has become a common tool for evaluating diagnostic imaging technologies, and the primary endpoint of such evaluations is the area under the curve (AUC), which integrates sensitivity over the entire false positive range. An alternative figure of merit for ROC studies is expected utility (EU), which focuses on the relevant region of the ROC curve as defined by disease prevalence and the relative utility of the task. However if this measure is to be used, it must also have desirable statistical properties keep the burden of observer performance studies as low as possible. Here, we evaluate effect size and variability for EU and AUC. We use two observer performance studies recently submitted to the FDA to compare the EU and AUC endpoints. The studies were conducted using the multi-reader multi-case methodology in which all readers score all cases in all modalities. ROC curves from the study were used to generate both the AUC and EU values for each reader and modality. The EU measure was computed assuming an iso-utility slope of 1.03. We find mean effect sizes, the reader averaged difference between modalities, to be roughly 2.0 times as big for EU as AUC. The standard deviation across readers is roughly 1.4 times as large, suggesting better statistical properties for the EU endpoint. In a simple power analysis of paired comparison across readers, the utility measure required 36% fewer readers on average to achieve 80% statistical power compared to AUC.

[1]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[2]  J. Swets,et al.  A decision-making theory of visual detection. , 1954, Psychological review.

[3]  David Gur,et al.  Is an ROC-type response truly always better than a binary response in observer performance studies? , 2010, Academic radiology.

[4]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[5]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[6]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[7]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part II. A formal model. , 2000, Academic radiology.

[8]  J D Habbema,et al.  Application of Treatment Thresholds to Diagnostic-test Evaluation , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[9]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part I. Some interesting examples of binormal degeneracy. , 2000, Academic radiology.

[10]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part III. Initial evaluation with detection ROC data. , 2000, Academic radiology.

[11]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[12]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[13]  C. D'Orsi,et al.  Diagnostic Performance of Digital versus Film Mammography for Breast-Cancer Screening , 2006 .

[14]  Charles E Metz,et al.  ROC analysis in medical imaging: a tutorial review of the literature , 2008, Radiological physics and technology.

[15]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[16]  John M Boone,et al.  Estimating the Relative Utility of Screening Mammography , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[17]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  Craig K. Abbey,et al.  An Equivalent Relative Utility Metric for Evaluating Screening Mammography , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  E. Halpern,et al.  Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. , 2013, Radiology.

[20]  W. W. Peterson,et al.  The theory of signal detectability , 1954, Trans. IRE Prof. Group Inf. Theory.

[21]  J. Hilden The Area under the ROC Curve and Its Competitors , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  Constantine A Gatsonis,et al.  American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. , 2005, Radiology.

[23]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[24]  N. Obuchowski ROC analysis. , 2005, AJR. American journal of roentgenology.

[25]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[26]  G. Garrido Cantarero,et al.  [The area under the ROC curve]. , 1996, Medicina clinica.

[27]  K. Kerlikowske,et al.  Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. , 1997, AJR. American journal of roentgenology.

[28]  Andrew D. A. Maidment,et al.  Comparison of receiver operating characteristic curves on the basis of optimal operating points. , 1996, Academic radiology.

[29]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.