A Multi-site Study of a Breast Density Deep Learning Model for Full-field Digital Mammography Images and Synthetic Mammography Images

Purpose To develop a Breast Imaging Reporting and Data System (BI-RADS) breast density deep learning (DL) model in a multisite setting for synthetic two-dimensional mammographic (SM) images derived from digital breast tomosynthesis examinations by using full-field digital mammographic (FFDM) images and limited SM data. Materials and Methods A DL model was trained to predict BI-RADS breast density by using FFDM images acquired from 2008 to 2017 (site 1: 57 492 patients, 187 627 examinations, 750 752 images) for this retrospective study. The FFDM model was evaluated by using SM datasets from two institutions (site 1: 3842 patients, 3866 examinations, 14 472 images, acquired from 2016 to 2017; site 2: 7557 patients, 16 283 examinations, 63 973 images, 2015 to 2019). Each of the three datasets were then split into training, validation, and test. Adaptation methods were investigated to improve performance on the SM datasets, and the effect of dataset size on each adaptation method was considered. Statistical significance was assessed by using CIs, which were estimated by bootstrapping. Results Without adaptation, the model demonstrated substantial agreement with the original reporting radiologists for all three datasets (site 1 FFDM: linearly weighted Cohen κ [κw] = 0.75 [95% CI: 0.74, 0.76]; site 1 SM: κw = 0.71 [95% CI: 0.64, 0.78]; site 2 SM: κw = 0.72 [95% CI: 0.70, 0.75]). With adaptation, performance improved for site 2 (site 1: κw = 0.72 [95% CI: 0.66, 0.79], 0.71 vs 0.72, P = .80; site 2: κw = 0.79 [95% CI: 0.76, 0.81], 0.72 vs 0.79, P < .001) by using only 500 SM images from that site. Conclusion A BI-RADS breast density DL model demonstrated strong performance on FFDM and SM images from two institutions without training on SM images and improved by using few SM images.Supplemental material is available for this article.Published under a CC BY 4.0 license.

[1]  N Houssami,et al.  Estimation of percentage breast tissue density: comparison between digital mammography (2D full field digital mammography) and digital breast tomosynthesis according to different BI-RADS categories. , 2013, The British journal of radiology.

[2]  Karla Kerlikowske,et al.  Comparison of Clinical and Automated Breast Density Measurements: Implications for Risk Prediction and Supplemental Screening. , 2016, Radiology.

[3]  Jules H Sumkin,et al.  Diagnostic accuracy and recall rates for digital mammography and digital mammography combined with one-view and two-view tomosynthesis: results of an enriched reader study. , 2014, AJR. American journal of roentgenology.

[4]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[5]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[6]  Nan Wu,et al.  Breast Density Classification with Deep Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  J Carpenter,et al.  Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[8]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[9]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[10]  Mark F. McEntee,et al.  BI-RADS density categorization using deep neural networks , 2019, Medical Imaging.

[11]  Emily F Conant,et al.  Breast cancer screening using tomosynthesis in combination with digital mammography. , 2014, JAMA.

[12]  P. Porter,et al.  Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. , 2000, Journal of the National Cancer Institute.

[13]  Lubomir M. Hadjiiski,et al.  Multi-path deep learning model for automated mammographic density categorization , 2019, Medical Imaging.

[14]  Yahong Luo,et al.  A deep learning method for classifying mammographic breast density categories , 2018, Medical physics.

[15]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[16]  Andrea J Cook,et al.  Breast cancer risk by breast density, menopause, and postmenopausal hormone therapy use. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  N. Boyd,et al.  Mammographic density and the risk and detection of breast cancer. , 2007, The New England journal of medicine.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Daniel Förnvik,et al.  Comparison between software volumetric breast density estimates in breast tomosynthesis and digital mammography images in a large public screening cohort , 2018, European Radiology.

[20]  Diana L Miglioretti,et al.  Reproducibility of BI‐RADS Breast Density Measures Among Community Radiologists: A Prospective Cohort Study , 2012, The breast journal.

[21]  V. McCormack,et al.  Breast Density and Parenchymal Patterns as Markers of Breast Cancer Risk: A Meta-analysis , 2006, Cancer Epidemiology Biomarkers & Prevention.

[22]  Andriy I. Bandos,et al.  Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program. , 2013, Radiology.

[23]  Cary P Gross,et al.  Adoption of Digital Breast Tomosynthesis in Clinical Practice. , 2019, JAMA internal medicine.

[24]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[25]  Emily F. Conant,et al.  Fully Automated Quantitative Estimation of Volumetric Breast Density from Digital Breast Tomosynthesis Images: Preliminary Results and Comparison with Digital Mammography and MR Imaging. , 2016, Radiology.

[26]  Eun Ju Son,et al.  Automated Volumetric Breast Density Measurements in the Era of the BI-RADS Fifth Edition: A Comparison With Visual Assessment. , 2016, AJR. American journal of roentgenology.

[27]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[28]  C. Lehman,et al.  National Performance Benchmarks for Modern Screening Digital Mammography: Update from the Breast Cancer Surveillance Consortium. , 2017, Radiology.

[29]  P. Langenberg,et al.  Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. , 2000, AJR. American journal of roentgenology.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).