Lack of agreement between radiologists: implications for image-based model observers

Abstract. We tested the agreement of radiologists’ rankings of different reconstructions of breast computed tomography images based on their diagnostic (classification) performance and on their subjective image quality assessments. We used 102 pathology proven cases (62 malignant, 40 benign), and an iterative image reconstruction (IIR) algorithm to obtain 24 reconstructions per case with different image appearances. Using image feature analysis, we selected 3 IIRs and 1 clinical reconstruction and 50 lesions. The reconstructions produced a range of image quality from smooth/low-noise to sharp/high-noise, which had a range in classifier performance corresponding to AUCs of 0.62 to 0.96. Six experienced Mammography Quality Standards Act (MQSA) radiologists rated the likelihood of malignancy for each lesion. We conducted an additional reader study with the same radiologists and a subset of 30 lesions. Radiologists ranked each reconstruction according to their preference. There was disagreement among the six radiologists on which reconstruction produced images with the highest diagnostic content, but they preferred the midsharp/noise image appearance over the others. However, the reconstruction they preferred most did not match with their performance. Due to these disagreements, it may be difficult to develop a single image-based model observer that is representative of a population of radiologists for this particular imaging task.

[1]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[2]  M. Giger,et al.  Volumetric texture analysis of breast lesions on contrast‐enhanced magnetic resonance images , 2007, Magnetic resonance in medicine.

[3]  Emil Y. Sidky,et al.  In-depth analysis of cone-beam CT image reconstruction by ideal observer performance on a detection task , 2008, 2008 IEEE Nuclear Science Symposium Conference Record.

[4]  Aldo Badano,et al.  A statistical, task-based evaluation method for three-dimensional x-ray breast imaging systems using variable-background phantoms. , 2010, Medical physics.

[5]  Howard C. Gifford,et al.  Task Equivalence for Model and Human-Observer Comparisons in SPECT Localization Studies , 2016, IEEE Transactions on Nuclear Science.

[6]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[7]  Jovan G. Brankov,et al.  Machine-learning model observer for detection and localization tasks in clinical SPECT-MPI , 2016, SPIE Medical Imaging.

[8]  M L Giger,et al.  Automated detection of mass lesions in dedicated breast CT: a preliminary study. , 2012, Medical physics.

[9]  L. Tanoue Computed Tomography — An Increasing Source of Radiation Exposure , 2009 .

[10]  Jim Thurston,et al.  NCRP Report No. 160: Ionizing Radiation Exposure of the Population of the United States , 2010 .

[11]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[12]  Warwick B. Lee,et al.  Markers of good performance in mammography depend on number of annual readings. , 2013, Radiology.

[13]  John M. Boone,et al.  Analysis of breast CT lesions using computer-aided diagnosis: an application of neural networks on extracted morphologic and texture features , 2012, Medical Imaging.

[14]  David H. Kim,et al.  Abdominal CT with model-based iterative reconstruction (MBIR): initial results of a prospective trial comparing ultralow-dose with standard-dose imaging. , 2012, AJR. American journal of roentgenology.

[15]  F O Bochud,et al.  Image quality in CT: From physical measurements to model observers. , 2015, Physica medica : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics.

[16]  Kyle J. Myers,et al.  Comparison of Channel Methods and Observer Models for the Task-Based Assessment of Multi-Projection Imaging in the Presence of Structured Anatomical Noise , 2016, IEEE Transactions on Medical Imaging.

[17]  H.H. Barrett,et al.  Model observers for assessment of image quality , 1993, 2002 IEEE Nuclear Science Symposium Conference Record.

[18]  Hiroaki Sugiura,et al.  Model-Based Iterative Reconstruction Technique for Ultralow-Dose Computed Tomography of the Lung: A Pilot Study , 2012, Investigative radiology.

[19]  Emil Y. Sidky,et al.  Efficient iterative image reconstruction algorithm for dedicated breast CT , 2016, SPIE Medical Imaging.

[20]  Craig K. Abbey,et al.  An Ideal Observer for a Model of X-Ray Imaging in Breast Parenchymal Tissue , 2008, Digital Mammography / IWDM.

[21]  E A Sickles,et al.  Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. , 2000, Radiology.

[22]  L. Feldkamp,et al.  Practical cone-beam algorithm , 1984 .

[23]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[24]  L. Bassett,et al.  When Radiologists Perform Best: The Learning Curve in Screening Mammogram Interpretation , 2010 .

[25]  D. Broga,et al.  Ionizing Radiation Exposure of the Population of the United States , 2009 .

[26]  Karen Drukker,et al.  Impact of lesion segmentation metrics on computer-aided diagnosis/detection in breast computed tomography , 2014, Journal of medical imaging.

[27]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[28]  J. Yee,et al.  ACRIN CT colonography trial: does reader's preference for primary two-dimensional versus primary three-dimensional interpretation affect performance? , 2011, Radiology.

[29]  Mini Das,et al.  Visual-search observers for assessing tomographic x-ray image quality. , 2016, Medical physics.

[30]  Ehsan Samei,et al.  Automated characterization of perceptual quality of clinical chest radiographs: validation and calibration to observer preference. , 2014, Medical physics.

[31]  Kyle J Myers,et al.  CT image assessment by low contrast signal detectability evaluation with unknown signal location. , 2013, Medical physics.

[32]  Kyle J. Myers,et al.  The Ideal Observer Objective Assessment Metric for Magnetic Resonance Imaging - Application to Signal Detection Tasks , 2011, IPMI.

[33]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[34]  Lori L. Barski,et al.  An image-based technique to assess the perceptual quality of clinical chest radiographs. , 2012, Medical physics.

[35]  Felipe M. Parages,et al.  A Naive-Bayes model observer for a human observer in detection, localization and assessment of perfusion defects in SPECT , 2013, 2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC).

[36]  Kyle J. Myers,et al.  Partial Least Squares: A Method to Estimate Efficient Channels for the Ideal Observers , 2010, IEEE Transactions on Medical Imaging.

[37]  Tim Marnitz,et al.  Does preference influence performance when reading different sizes of cranial computed tomography? , 2014, Journal of medical imaging.

[38]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[39]  Robert M. Nishikawa,et al.  Evaluation of a 3D lesion segmentation algorithm on DBT and breast CT images , 2010, Medical Imaging.

[40]  Miguel P. Eckstein,et al.  The effect of nonlinear human visual system components on performance of a channelized Hotelling observer in structured backgrounds , 2006, IEEE Transactions on Medical Imaging.

[41]  C. D'Orsi,et al.  Dedicated breast computed tomography: the optimal cross-sectional imaging solution? , 2010, Radiologic clinics of North America.

[42]  Mahdi M. Kalayeh,et al.  Generalization Evaluation of Machine Learning Numerical Observers for Image Quality Assessment , 2013, IEEE Transactions on Nuclear Science.

[43]  Yongyi Yang,et al.  Learning a Channelized Observer for Image Quality Assessment , 2009, IEEE Transactions on Medical Imaging.

[44]  J. Elmore,et al.  Variability in interpretive performance at screening mammography and radiologists' characteristics associated with accuracy. , 2009, Radiology.

[45]  Ingrid Reiser,et al.  Local curvature analysis for classifying breast tumors: Preliminary analysis in dedicated breast CT. , 2015, Medical physics.

[46]  Eun Sun Lee,et al.  One-mSv CT colonography: Effect of different iterative reconstruction algorithms on radiologists' performance. , 2016, European journal of radiology.

[47]  Jovan G. Brankov,et al.  Numerical Surrogates for Human Observers in Myocardial Motion Evaluation From SPECT Images , 2014, IEEE Transactions on Medical Imaging.

[48]  Nghia Q. Nguyen,et al.  Optimal beamforming in ultrasound using the ideal observer , 2010, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control.

[49]  Benoit M. Dawant,et al.  Morphometric analysis of white matter lesions in MR images: method and validation , 1994, IEEE Trans. Medical Imaging.

[50]  Xin He,et al.  Model Observers in Medical Imaging Research , 2013, Theranostics.

[51]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[52]  H C Gifford Efficient visual-search model observers for PET. , 2014, The British journal of radiology.