论文信息 - Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.

Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density.

OBJECTIVE We developed deep learning algorithms to automatically assess BI-RADS breast density. METHODS Using a large multi-institution patient cohort of 108,230 digital screening mammograms from the Digital Mammographic Imaging Screening Trial, we investigated the effect of data, model, and training parameters on overall model performance and provided crowdsourcing evaluation from the attendees of the ACR 2019 Annual Meeting. RESULTS Our best-performing algorithm achieved good agreement with radiologists who were qualified interpreters of mammograms, with a four-class κ of 0.667. When training was performed with randomly sampled images from the data set versus sampling equal number of images from each density category, the model predictions were biased away from the low-prevalence categories such as extremely dense breasts. The net result was an increase in sensitivity and a decrease in specificity for predicting dense breasts for equal class compared with random sampling. We also found that the performance of the model degrades when we evaluate on digital mammography data formats that differ from the one that we trained on, emphasizing the importance of multi-institutional training sets. Lastly, we showed that crowdsourced annotations, including those from attendees who routinely read mammograms, had higher agreement with our algorithm than with the original interpreting radiologists. CONCLUSION We demonstrated the possible parameters that can influence the performance of the model and how crowdsourcing can be used for evaluation. This study was performed in tandem with the development of the ACR AI-LAB, a platform for democratizing artificial intelligence.

[1] Gianluca Pollastri,et al. A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[2] A. Jemal,et al. Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[3] Bin Liu,et al. Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer , 2015, EBioMedicine.

[4] Jared A. Dunnmon,et al. Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs. , 2019, Radiology.

[5] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6] L. Shah,et al. Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge. , 2020, Radiology. Artificial intelligence.

[7] L. Liberman,et al. Breast imaging reporting and data system (BI-RADS). , 2002, Radiologic clinics of North America.

[8] Klaus H. Maier-Hein,et al. Abstract: nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation , 2019, Bildverarbeitung für die Medizin.

[9] E Keavey,et al. Comparison of the clinical performance of three digital mammography systems in a breast cancer screening programme. , 2012, The British journal of radiology.

[10] Carol C Wu,et al. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. , 2019, Radiology. Artificial intelligence.

[11] Ashirbani Saha,et al. Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[12] Eun Ju Son,et al. Automated Volumetric Breast Density Measurements in the Era of the BI-RADS Fifth Edition: A Comparison With Visual Assessment. , 2016, AJR. American journal of roentgenology.

[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.