Screening mammography: test set data can reasonably describe actual clinical reporting.

PURPOSE To establish the extent to which test set reading can represent actual clinical reporting in screening mammography. MATERIALS AND METHODS Institutional ethics approval was granted, and informed consent was obtained from each participating screen reader. The need for informed consent with respect to the use of patient materials was waived. Two hundred mammographic examinations were selected from examinations reported by 10 individual expert screen readers, resulting in 10 reader-specific test sets. Data generated from actual clinical reports were compared with three test set conditions: clinical test set reading with prior images, laboratory test set reading with prior images, and laboratory test set reading without prior images. A further set of five expert screen readers was asked to interpret a common set of images in two identical test set conditions to establish a baseline for intraobserver variability. Confidence scores (from 1 to 4) were assigned to the respective decisions made by readers. Region-of-interest (ROI) figures of merit (FOMs) and side-specific sensitivity and specificity were described for the actual clinical reporting of each reader-specific test set and were compared with those for the three test set conditions. Agreement between pairs of readings was performed by using the Kendall coefficient of concordance. RESULTS Moderate or acceptable levels of agreement were evident (W = 0.69-0.73, P < .01) when describing group performance between actual clinical reporting and test set conditions that were reasonably close to the established baseline (W = 0.77, P < .01) and were lowest when prior images were excluded. Higher median values for ROI FOMs were demonstrated for the test set conditions than for the actual clinical reporting values; this was possibly linked to changes in sensitivity. CONCLUSION Reasonable levels of agreement between actual clinical reporting and test set conditions can be achieved, although inflated sensitivity may be evident with test set conditions.

[1]  Micheal Evanoff,et al.  Optimum ambient lighting conditions for the viewing of softcopy radiological images , 2006, SPIE Medical Imaging.

[2]  Inter- and intraobserver variation between radiologists in the detection of abnormal parenchymal lung changes on high-resolution computed tomography , 2010, Annals of Saudi medicine.

[3]  C. Rutter,et al.  Assessing mammographers' accuracy. A comparison of clinical and test performance. , 2000, Journal of clinical epidemiology.

[4]  N A Obuchowski,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[5]  Alastair G. Gale,et al.  Performs: a self-assessment scheme for radiologists in breast screening , 2003 .

[6]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[7]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[8]  C. Rutter,et al.  Bootstrap estimation of diagnostic accuracy with patient-clustered data. , 2000, Academic radiology.

[9]  Ben J Hicks,et al.  SPIE - The International Society for Optical Engineering , 2001 .

[10]  Alastair G. Gale,et al.  The relationship between real life breast screening and an annual self assessment scheme , 2009, Medical Imaging.

[11]  Luisa P. Wallace,et al.  The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations. , 2008, Radiology.

[12]  Rachel Toomey,et al.  The impact of acoustic noise found within clinical departments on radiology performance. , 2008, Academic radiology.

[13]  Mark F McEntee,et al.  Diagnostic efficacy of handheld devices for emergency radiologic consultation. , 2010, AJR. American journal of roentgenology.

[14]  David J Manning,et al.  Ambient lighting: effect of illumination on soft-copy viewing of radiographs of the wrist. , 2007, AJR. American journal of roentgenology.

[15]  Warwick B. Lee,et al.  Assessing reader performance in radiology, an imperfect science: lessons from breast screening. , 2012, Clinical radiology.

[16]  Hazel J. Scott,et al.  Breast screening: PERFORMS identifies key mammographic training needs. , 2006, The British journal of radiology.

[17]  Mark F McEntee,et al.  The effect of abnormality-prevalence expectation on expert observer performance and visual search. , 2011, Radiology.

[18]  Reliability and validity of chest radiograph surveillance programs. , 2001, Chest.

[19]  R. Charter A Breakdown of Reliability Coefficients by Test Type and Reliability Method, and the Clinical Implications of Low Reliability , 2003, The Journal of general psychology.