Comparative statistical properties of expected utility and area under the ROC curve for laboratory studies of observer performance in screening mammography.

RATIONALE AND OBJECTIVES Our objective is to determine whether expected utility (EU) and the area under the receiver operator characteristic (AUC) are consistent with one another as endpoints of observer performance studies in mammography. These two measures characterize receiver operator characteristic performance somewhat differently. We compare these two study endpoints at the level of individual reader effects, statistical inference, and components of variance across readers and cases. MATERIALS AND METHODS We reanalyze three previously published laboratory observer performance studies that investigate various x-ray breast imaging modalities using EU and AUC. The EU measure is based on recent estimates of relative utility for screening mammography. RESULTS The AUC and EU measures are correlated across readers for individual modalities (r = 0.93) and differences in modalities (r = 0.94 to 0.98). Statistical inference for modality effects based on multi-reader multi-case analysis is very similar, with significant results (P < .05) in exactly the same conditions. Power analyses show mixed results across studies, with a small increase in power on average for EU that corresponds to approximately a 7% reduction in the number of readers. Despite a large number of crossing receiver operator characteristic curves (59% of readers), modality effects only rarely have opposite signs for EU and AUC (6%). CONCLUSIONS We do not find any evidence of systematic differences between EU and AUC in screening mammography observer studies. Thus, when utility approaches are viable (i.e., an appropriate value of relative utility exists), practical effects such as statistical efficiency may be used to choose study endpoints.

[1]  C. D'Orsi,et al.  Accuracy of screening mammography interpretation by characteristics of radiologists. , 2004, Journal of the National Cancer Institute.

[2]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[3]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[4]  R. F. Wagner,et al.  Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  Craig K. Abbey,et al.  Statistical properties of a utility measure of observer performance compared to area under the ROC curve , 2013, Medical Imaging.

[6]  Constantine Gatsonis,et al.  Accuracy of soft-copy digital mammography versus that of screen-film mammography according to digital manufacturer: ACRIN DMIST retrospective multireader study. , 2008, Radiology.

[7]  John M Boone,et al.  Estimating the Relative Utility of Screening Mammography , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[8]  E. Halpern,et al.  Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. , 2013, Radiology.

[9]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part II. A formal model. , 2000, Academic radiology.

[10]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part III. Initial evaluation with detection ROC data. , 2000, Academic radiology.

[11]  Lee B. Lusted,et al.  Introduction to medical decision making , 1968 .

[12]  L B Lusted,et al.  Radiographic applications of receiver operating characteristic (ROC) curves. , 1974, Radiology.

[13]  Craig K. Abbey,et al.  An Equivalent Relative Utility Metric for Evaluating Screening Mammography , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[15]  Charles E Metz,et al.  ROC analysis in medical imaging: a tutorial review of the literature , 2008, Radiological physics and technology.

[16]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[17]  Constantine A Gatsonis,et al.  American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. , 2005, Radiology.

[18]  Kevin S. Berbaum,et al.  A contaminated binormal model for ROC data , 2000 .

[19]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Kunio Doi,et al.  Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. , 2009, Radiology.

[21]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[22]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[23]  Nancy A. Obuchowski,et al.  Power estimation for multireader ROC methods an updated and unified approach. , 2011, Academic radiology.

[24]  Kyle J Myers,et al.  Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. , 2012, Academic radiology.

[25]  J. Hilden The Area under the ROC Curve and Its Competitors , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[26]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[27]  N. Petrick,et al.  Improvement in radiologists' characterization of malignant and benign breast masses on serial mammograms with computer-aided diagnosis: an ROC study. , 2004, Radiology.

[28]  Andrew D. A. Maidment,et al.  Comparison of receiver operating characteristic curves on the basis of optimal operating points. , 1996, Academic radiology.

[29]  C. D'Orsi,et al.  Diagnostic Performance of Digital Versus Film Mammography for Breast-Cancer Screening , 2005, The New England journal of medicine.

[30]  R. F. Wagner,et al.  Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. , 2001, Academic radiology.

[31]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[32]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[34]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[35]  Constantine Gatsonis,et al.  Comparison of soft-copy and hard-copy reading for full-field digital mammography. , 2009, Radiology.

[36]  Brandon D Gallas,et al.  Statistical power considerations for a utility endpoint in observer performance studies. , 2013, Academic radiology.

[37]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.