Estimating the Relative Utility of Screening Mammography

Background. The concept of diagnostic utility is a fundamental component of signal detection theory, going back to some of its earliest works. Attaching utility values to the various possible outcomes of a diagnostic test should, in principle, lead to meaningful approaches to evaluating and comparing such systems. However, in many areas of medical imaging, utility is not used because it is presumed to be unknown. Methods. In this work, we estimate relative utility (the utility benefit of a detection relative to that of a correct rejection) for screening mammography using its known relation to the slope of a receiver operating characteristic (ROC) curve at the optimal operating point. The approach assumes that the clinical operating point is optimal for the goal of maximizing expected utility and therefore the slope at this point implies a value of relative utility for the diagnostic task, for known disease prevalence. We examine utility estimation in the context of screening mammography using the Digital Mammographic Imaging Screening Trials (DMIST) data. Results. We show how various conditions can influence the estimated relative utility, including characteristics of the rating scale, verification time, probability model, and scope of the ROC curve fit. Relative utility estimates range from 66 to 227. Conclusions. We argue for one particular set of conditions that results in a relative utility estimate of 162 (±14%). This is broadly consistent with values in screening mammography determined previously by other means. At the disease prevalence found in the DMIST study (0.59% at 365-day verification), optimal ROC slopes are near unity, suggesting that utility-based assessments of screening mammography will be similar to those found using Youden’s index.

[1]  N. Obuchowski ROC analysis. , 2005, AJR. American journal of roentgenology.

[2]  András Kocsor,et al.  ROC analysis: applications to the classification of biological sequences and 3D structures , 2008, Briefings Bioinform..

[3]  Andrew D. A. Maidment,et al.  Comparison of receiver operating characteristic curves on the basis of optimal operating points. , 1996, Academic radiology.

[4]  Kevin S. Berbaum,et al.  A contaminated binormal model for ROC data , 2000 .

[5]  Charles E Metz,et al.  ROC analysis in medical imaging: a tutorial review of the literature , 2008, Radiological physics and technology.

[6]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[7]  C. D'Orsi,et al.  Diagnostic Performance of Digital versus Film Mammography for Breast-Cancer Screening , 2006 .

[8]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[9]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[10]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[11]  J. Kassirer,et al.  Therapeutic decision making: a cost-benefit analysis. , 1975, The New England journal of medicine.

[12]  Luisa P. Wallace,et al.  The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations. , 2008, Radiology.

[13]  J. Boone,et al.  Determining Sensitivity of Mammography from Screening Data, Cancer Incidence, and Receiver-Operating Characteristic Curve Parameters , 2002, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  R. F. Wagner,et al.  Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[15]  T. M. Kolb,et al.  Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. , 2002, Radiology.

[16]  David Gur,et al.  The prevalence effect in a laboratory environment: Changing the confidence ratings. , 2007, Academic radiology.

[17]  David Gur,et al.  Is an ROC-type response truly always better than a binary response in observer performance studies? , 2010, Academic radiology.

[18]  R. Birdwell,et al.  Comparison of Digital Mammography and Screen-Film Mammography in Breast Cancer Screening: A Review in the Irish Breast Screening Program , 2010 .

[19]  Stuart G. Baker,et al.  A Proposed Design and Analysis for Comparing Digital and Analog Mammography , 2001 .

[20]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part I. Some interesting examples of binormal degeneracy. , 2000, Academic radiology.

[21]  J. Hilden The Area under the ROC Curve and Its Competitors , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  E. Sickles Successful methods to reduce false-positive mammography interpretations. , 2000, Radiologic clinics of North America.

[23]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[24]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[25]  S. Rubin,et al.  Efficacy of screening mammography. A meta-analysis. , 1995, JAMA.

[26]  C. Gatsonis,et al.  Generalized Estimating Equations for Ordinal Categorical Data: Arbitrary Patterns of Missing Responses and Missingness in a Key Covariate , 1999, Biometrics.

[27]  G. Gorry,et al.  Decision analysis and clinical judgment. , 1973, The American journal of medicine.

[28]  D. Miglioretti,et al.  Performance of diagnostic mammography differs in the United States and Denmark , 2010, International Journal of Cancer.

[29]  M. Greiner,et al.  Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. , 2000, Preventive veterinary medicine.

[30]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[31]  S. Woolf,et al.  Breast Cancer Screening: A Summary of the Evidence for the U.S. Preventive Services Task Force , 2002, Annals of Internal Medicine.

[32]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part II. A formal model. , 2000, Academic radiology.

[33]  N. Perkins,et al.  The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve. , 2006, American journal of epidemiology.

[34]  Craig K. Abbey,et al.  An Equivalent Relative Utility Metric for Evaluating Screening Mammography , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[35]  W. Barlow,et al.  Time trends in radiologists' interpretive performance at screening mammography from the community-based Breast Cancer Surveillance Consortium, 1996-2004. , 2010, Radiology.

[36]  X H Zhou,et al.  Correcting for verification bias in studies of a diagnostic test's accuracy , 1998, Statistical methods in medical research.

[37]  C. D'Orsi,et al.  Accuracy of screening mammography interpretation by characteristics of radiologists. , 2004, Journal of the National Cancer Institute.

[38]  J. Swets,et al.  A decision-making theory of visual detection. , 1954, Psychological review.

[39]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[40]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[41]  Dev P Chakraborty,et al.  Recent advances in observer performance methodology: jackknife free-response ROC (JAFROC). , 2005, Radiation protection dosimetry.

[42]  W. W. Peterson,et al.  The theory of signal detectability , 1954, Trans. IRE Prof. Group Inf. Theory.

[43]  Deborah H Glueck,et al.  Bias in estimating accuracy of a binary screening test with differential disease verification. , 2011, Statistics in medicine.

[44]  Dev P Chakraborty,et al.  Validation and statistical power comparison of methods for analyzing free-response observer performance studies. , 2008, Academic radiology.

[45]  Constantine A Gatsonis,et al.  American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. , 2005, Radiology.

[46]  X H Zhou,et al.  Assessing the relative accuracies of two screening tests in the presence of verification bias. , 2000, Statistics in medicine.

[47]  D. Chakraborty,et al.  Free-response methodology: alternate analysis and a new observer-performance experiment. , 1990, Radiology.

[48]  S. Feig Economic challenges in breast imaging. A survivor's guide to success. , 2000, Radiologic clinics of North America.

[49]  S. Baker Putting risk prediction in perspective: relative utility curves. , 2009, Journal of the National Cancer Institute.

[50]  J. Skelly,et al.  Comparing screening mammography for early breast cancer detection in Vermont and Norway. , 2008, Journal of the National Cancer Institute.

[51]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[52]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[53]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[54]  K S Berbaum,et al.  A contaminated binormal model for ROC data: Part III. Initial evaluation with detection ROC data. , 2000, Academic radiology.

[55]  Yulei Jiang,et al.  BI-RADS data should not be used to estimate ROC curves. , 2010, Radiology.

[56]  J D Habbema,et al.  Application of Treatment Thresholds to Diagnostic-test Evaluation , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.