Deep Learning in Mammography: Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer

Objectives The aim of this study was to evaluate the diagnostic accuracy of a multipurpose image analysis software based on deep learning with artificial neural networks for the detection of breast cancer in an independent, dual-center mammography data set. Materials and Methods In this retrospective, Health Insurance Portability and Accountability Act-compliant study, all patients undergoing mammography in 2012 at our institution were reviewed (n = 3228). All of their prior and follow-up mammographies from a time span of 7 years (2008–2015) were considered as a reference for clinical diagnosis. After applying exclusion criteria (missing reference standard, prior procedures or therapies), patients with the first diagnosis of a malignoma or borderline lesion were selected (n = 143). Histology or clinical long-term follow-up served as reference standard. In a first step, a breast density-and age-matched control cohort was selected (n = 143) from the remaining patients with more than 2 years follow-up (n = 1003). The neural network was trained with this data set. From the publicly available Breast Cancer Digital Repository data set, patients with cancer and a matched control cohort were selected (n = 35 × 2). The performance of the trained neural network was also tested with this external data set. Three radiologists (3, 5, and 10 years of experience) evaluated the test data set. In a second step, the neural network was trained with all cases from January to September and tested with cases from October to December 2012 (screening-like cohort). The radiologists also evaluated this second test data set. The areas under the receiver operating characteristic curve between readers and the neural network were compared. A Bonferroni-corrected P value of less than 0.016 was considered statistically significant. Results Mean age of patients with lesion was 59.6 years (range, 35–88 years) and in controls, 59.1 years (35–83 years). Breast density distribution (A/B/C/D) was 21/59/42/21 and 22/60/41/20, respectively. Histologic diagnoses were invasive ductal carcinoma in 90, ductal in situ carcinoma in 13, invasive lobular carcinoma in 13, mucinous carcinoma in 3, and borderline lesion in 12 patients. In the first step, the area under the receiver operating characteristic curve of the trained neural network was 0.81 and comparable on the test cases 0.79 (P = 0.63). One of the radiologists showed almost equal performance (0.83, P = 0.17), whereas 2 were significantly better (0.91 and 0.94, P < 0.016). In the second step, performance of the neural network (0.82) was not significantly different from the human performance (0.77–0.87, P > 0.016); however, radiologists were consistently less sensitive and more specific than the neural network. Conclusions Current state-of-the-art artificial neural networks for general image analysis are able to detect cancer in mammographies with similar accuracy to radiologists, even in a screening-like cohort with low breast cancer prevalence.

[1]  N A Obuchowski,et al.  Nonparametric analysis of clustered ROC curve data. , 1997, Biometrics.

[2]  Sue M Moss,et al.  False-Positive Results in the Randomized Controlled Trial of Mammographic Screening from Age 40 (“Age” Trial) , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[3]  Jürgen Schmidhuber,et al.  Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation , 2015, NIPS.

[4]  John Brodersen,et al.  Long-Term Psychosocial Consequences of False-Positive Screening Mammography , 2013, The Annals of Family Medicine.

[5]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[7]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[8]  Gustavo Carneiro,et al.  Automated Mass Detection in Mammograms Using Cascaded Deep Learning and Random Forests , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[9]  D. Miglioretti,et al.  Cumulative Probability of False-Positive Recall or Biopsy Recommendation After 10 Years of Screening Mammography , 2011, Annals of Internal Medicine.

[10]  C. D'Orsi,et al.  Diagnostic Performance of Digital Versus Film Mammography for Breast-Cancer Screening , 2005, The New England journal of medicine.

[11]  D. Cicchetti,et al.  Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. , 1981, American journal of mental deficiency.

[12]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[13]  Talya Salz,et al.  Meta‐analyses of the effect of false‐positive mammograms on generic and specific psychosocial outcomes , 2010, Psycho-oncology.

[14]  Miguel Ángel Guevara-López,et al.  An evaluation of image descriptors combined with clinical data for breast cancer diagnosis , 2013, International Journal of Computer Assisted Radiology and Surgery.

[15]  Laura Cortesi,et al.  Multicenter Surveillance of Women at High Genetic Breast Cancer Risk Using Mammography, Ultrasonography, and Contrast-Enhanced Magnetic Resonance Imaging (the High Breast Cancer Risk Italian 1 Study): Final Results , 2011, Investigative radiology.

[16]  Arianna Mencattini,et al.  Towards localization of malignant sites of asymmetry across bilateral mammograms , 2017, Comput. Methods Programs Biomed..

[17]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18]  Miguel Ángel Guevara-López,et al.  Discovering Mammography-based Machine Learning Classifiers for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[19]  Daniel L. Rubin,et al.  Probabilistic visual search for masses within mammography images using deep learning , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[20]  M. A. van den Bosch,et al.  Prediction Model For Extensive Ductal Carcinoma In Situ Around Early-Stage Invasive Breast Cancer , 2016, Investigative radiology.

[21]  Miguel Ángel Guevara-López,et al.  Benchmarking Datasets for Breast Cancer Computer-Aided Diagnosis (CADx) , 2013, CIARP.

[22]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[23]  Nico Karssemeijer,et al.  Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring , 2016, IEEE Transactions on Medical Imaging.

[24]  Wenqing Sun,et al.  A Preliminary Study on Breast Cancer Risk Analysis Using Deep Neural Network , 2016, Digital Mammography / IWDM.

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Martin Hellmich,et al.  Comparison of the Detection Rate of Simulated Microcalcifications in Full-Field Digital Mammography, Digital Breast Tomosynthesis, and Synthetically Reconstructed 2-Dimensional Images Performed With 2 Different Digital X-ray Mammography Systems , 2017, Investigative radiology.

[27]  Madhavi Raghu,et al.  Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening. , 2013, Radiology.

[28]  Zhen Yang,et al.  A new method of detecting micro-calcification clusters in mammograms using contourlet transform and non-linking simplified PCNN , 2016, Comput. Methods Programs Biomed..

[29]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.