Impact of prevalence and case distribution in lab-based diagnostic imaging studies

Abstract. We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five reader studies that differed in terms of cancer prevalence and the distribution of noncancers. Twenty radiologists participated in each reader study. Using split-plot study designs, we collected recall decisions and multilevel scores from the radiologists for calculating sensitivity, specificity, and AUC. Differences in reader-averaged AUCs slightly favored SFM over FFDM (biggest AUC difference: 0.047, SE  =  0.023, p  =  0.047), where standard error accounts for reader and case variability. The differences were not significant at a level of 0.01 (0.05/5 reader studies). The differences in sensitivities and specificities were also indeterminate. Prevalence had little effect on AUC (largest difference: 0.02), whereas sensitivity increased and specificity decreased as prevalence increased. We found that AUC is robust to changes in prevalence, while radiologists were more aggressive with recall decisions as prevalence increased.

[1]  Kyle J Myers,et al.  Multireader multicase variance analysis for binary data. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[2]  David G. Brown,et al.  Reader studies for validation of CAD systems , 2008, Neural Networks.

[3]  Luisa P. Wallace,et al.  The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations. , 2008, Radiology.

[4]  C. D'Orsi,et al.  Diagnostic Performance of Digital Versus Film Mammography for Breast-Cancer Screening , 2005, The New England journal of medicine.

[5]  Z. J. Ulehla,et al.  Optimality of perceptual decision criteria. , 1966, Journal of experimental psychology.

[6]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[7]  Weijie Chen,et al.  Paired split-plot designs of multireader multicase studies , 2018, Journal of medical imaging.

[8]  Constantine Gatsonis,et al.  Comparison of soft-copy and hard-copy reading for full-field digital mammography. , 2009, Radiology.

[9]  Karla K. Evans,et al.  If You Don’t Find It Often, You Often Don’t Find It: Why Some Cancers Are Missed in Breast Cancer Screening , 2013, PloS one.

[10]  C. Lehman,et al.  Comparative Effectiveness of Digital Versus Film-Screen Mammography in Community Practice in the United States , 2011, Annals of Internal Medicine.

[11]  David Gur,et al.  The prevalence effect in a laboratory environment: Changing the confidence ratings. , 2007, Academic radiology.

[12]  Stefano Ciatto,et al.  Full-field digital versus screen-film mammography: comparative accuracy in concurrent screening cohorts. , 2007, AJR. American journal of roentgenology.

[13]  B. Hillman,et al.  ACRIN—lessons learned in conducting multi-center trials of imaging and cancer , 2005, Cancer imaging : the official publication of the International Cancer Imaging Society.

[14]  David Gur,et al.  From the laboratory to the clinic: the "prevalence effect". , 2003, Academic radiology.

[15]  Jeremy M Wolfe,et al.  Prevalence of abnormalities influences cytologists' error rates in screening for cervical cancer. , 2011, Archives of pathology & laboratory medicine.

[16]  Herbert Y Kressel,et al.  Consensus interpretation in imaging research: is there a better way? , 2010, Radiology.

[17]  Nancy A Obuchowski,et al.  Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[18]  F. Samuelson,et al.  The average receiver operating characteristic curve in multireader multicase imaging studies. , 2014, The British journal of radiology.

[19]  Constantine A Gatsonis,et al.  American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. , 2005, Radiology.

[20]  S. V. Destounis Accuracy of Soft-Copy Digital Mammography versus That of Screen-Film Mammography according to Digital Manufacturer: ACRIN DMIST Retrospective Multireader Study , 2009 .

[21]  Brandon D Gallas,et al.  Impact of different study populations on reader behavior and performance metrics: initial results , 2017, Medical Imaging.

[22]  Stephen L Hillis,et al.  A marginal‐mean ANOVA approach for analyzing multireader multicase radiological imaging data , 2014, Statistics in medicine.

[23]  Jeremy M. Wolfe,et al.  26.5 brief comms NEW , 2005 .

[24]  A R Feinstein,et al.  Context bias. A problem in diagnostic radiology. , 1996, JAMA.

[25]  C. D'Orsi Breast Imaging Reporting and Data System (BI-RADS) , 2018 .

[26]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[27]  R. F. Wagner,et al.  A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators , 2009 .

[28]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .