Multi-path deep learning model for automated mammographic density categorization

Breast density is one of the strongest risk factors for breast cancer. Our purpose of this study is to develop a deep learning model for BI-RADS density classification on digital mammograms (DM). With IRB approval, 2581 DMs were retrospectively collected from 672 women in our institution. We designed a multi-path DCNN (MP-DCNN) to classify each DM into one of four BI-RADS density categories. The MP-DCNN has four inputs: (1) subsampled DM (800 μm pixel spacing), (2) a mask of dense area (MDA) obtained with a U-net (800 μm pixel spacing), (3) the largest square region of interest (ROI) within mammographic breast (100 μm pixel spacing), and (4) automated percentage of breast density (PD). As the baseline statistic, a single path DCNN with subsampled DM (800 um pixel spacing) as input was used. An experienced Mammography Quality Standards Act (MQSA) radiologist provided BI-RADS density category and PD by interactive thresholding as the reference standards. With ten-fold cross-validation, the BI-RADS categories by MP-DCNN for 2068 of the 2581 cases agreed with radiologist’s assessment (accuracy = 80.7%, weighted kappa = 0.83) and the accuracy reached 89.0% if the breasts were categorized as non-dense (BI-RADS A & B) and dense (BIRADS C & D). For comparison, a single path DCNN as the baseline model obtained agreement in 1906 of the 2581 cases (accuracy = 73.8%, weighted kappa = 0.75). The improvement in BI-RADS classification from the baseline to the MP-DCNN was statistically significant (p<0.001).