Inconsistent Performance of Deep Learning Models on Mammogram Classification.

OBJECTIVES Performance of recently developed deep learning models for image classification surpasses that of radiologists. However, there are questions about model performance consistency and generalization in unseen external data. The purpose of this study is to determine if the high performance of deep learning on mammograms can be transferred to external data with a different data distribution. MATERIALS AND METHODS Six deep learning models (three published models with high performance and three models designed by us) were evaluated on four different mammogram data sets, including three public (Digital Database for Screening Mammography, INbreast, and Mammographic Image Analysis Society) and one private data set (UKy). The models were trained and validated on either Digital Database for Screening Mammography alone or a combined data set that included Digital Database for Screening Mammography. The models were then tested on the three external data sets. The area under the receiver operating characteristic curve was used to evaluate model performance. RESULTS The three published models reported validation area under the receiver operating characteristic curve scores between 0.88 and 0.95 on the validation data set. Our models achieved between 0.71 (95% confidence interval [CI]: 0.70-0.72) and 0.79 (95% CI: 0.78-0.80) area under the receiver operating characteristic curve on the same validation data set. However, the same evaluation criteria of all six models on the three external test data sets were significantly decreased, only between 0.44 (95% CI: 0.43-0.45) and 0.65 (95% CI: 0.64-0.66). CONCLUSION Our results demonstrate performance inconsistency across the data sets and models, indicating that the high performance of deep learning models on one data set cannot be readily transferred to unseen external data sets, and these models need further assessment and validation before being applied in clinical practice.

[1]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[2]  Berkman Sahiner,et al.  Deep learning in medical imaging and radiation therapy. , 2018, Medical physics.

[3]  Marios Anthimopoulos,et al.  Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network , 2016, IEEE Transactions on Medical Imaging.

[4]  Sang Min Lee,et al.  Deep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses. , 2019, Radiology.

[5]  Krzysztof J. Geras,et al.  Artificial Intelligence for Mammography and Digital Breast Tomosynthesis: Current Concepts and Future Perspectives. , 2019, Radiology.

[6]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[7]  Lazaros T. Tsochatzidis,et al.  Deep Learning for Breast Cancer Diagnosis from Mammograms—A Comparative Study , 2019, J. Imaging.

[8]  S. McGuire World Cancer Report 2014. Geneva, Switzerland: World Health Organization, International Agency for Research on Cancer, WHO Press, 2015. , 2016, Advances in nutrition.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Li Shen,et al.  End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design , 2017, ArXiv.

[11]  M. Yaffe,et al.  Impact of computer-aided detection systems on radiologist accuracy with digital mammography. , 2014, AJR. American journal of roentgenology.

[12]  Daniel L Rubin,et al.  A curated mammography data set for use in computer-aided detection and diagnosis research , 2017, Scientific Data.

[13]  Sheida Nabavi,et al.  Deep convolutional neural networks for mammography: advances, challenges and applications , 2019, BMC Bioinformatics.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  James H Thrall,et al.  Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. , 2018, Journal of the American College of Radiology : JACR.

[16]  Nan Wu,et al.  Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening , 2019, IEEE Transactions on Medical Imaging.

[17]  Tianfu Wang,et al.  Breast Cancer Detection and Diagnosis Using Mammographic Data: Systematic Review , 2019, Journal of medical Internet research.

[18]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[19]  A. Jemal,et al.  Cancer statistics, 2016 , 2016, CA: a cancer journal for clinicians.

[20]  Hui Li,et al.  Transfer Learning From Convolutional Neural Networks for Computer-Aided Diagnosis: A Comparison of Digital Breast Tomosynthesis and Full-Field Digital Mammography. , 2019, Academic radiology.

[21]  Michael Unser,et al.  CNN-Based Projected Gradient Descent for Consistent CT Image Reconstruction , 2017, IEEE Transactions on Medical Imaging.

[22]  Ahmed Hosny,et al.  Artificial intelligence in radiology , 2018, Nature Reviews Cancer.

[23]  Dimitrios Korkinof,et al.  Deep Learning in Breast Cancer Screening , 2019, Artificial Intelligence in Medical Imaging.

[24]  Daniel L Rubin,et al.  Artificial Intelligence in Imaging: The Radiologist's Role. , 2019, Journal of the American College of Radiology : JACR.

[25]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[26]  S. Park,et al.  Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers , 2019, Korean journal of radiology.

[27]  Qianni Zhang,et al.  Three-Class Mammogram Classification Based on Descriptive CNN Features , 2017, BioMed research international.

[28]  Roie Melamed,et al.  Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms. , 2019, Radiology.

[29]  Mohammad Mansouri,et al.  A Deep-Learning System for Fully-Automated Peripherally Inserted Central Catheter (PICC) Tip Detection , 2018, Journal of Digital Imaging.

[30]  Jaime S. Cardoso,et al.  INbreast: toward a full-field digital mammographic database. , 2012, Academic radiology.

[31]  Xuanqin Mou,et al.  Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss , 2017, IEEE Transactions on Medical Imaging.

[32]  Nathan Jacobs,et al.  Automatic Hand Skeletal Shape Estimation From Radiographs , 2019, IEEE Transactions on NanoBioscience.

[33]  Gustavo Carneiro,et al.  Deep Learning and Structured Prediction for the Segmentation of Mass in Mammograms , 2015, MICCAI.

[34]  István Csabai,et al.  Detecting and classifying lesions in mammograms with Deep Learning , 2017, Scientific Reports.