Toward Automatic Mammography Auditing via Universal Language Model Fine Tuning

Successful mammography screening reduces the average breast cancer mortality by 20% to 22%. However large-scale screening of mammography is subject to variability due to the inherent subjective nature of evaluating mammograms. The Mammography Quality Standards Act (MQSA) require radiologists to reports their outcome in an attempt to reduce the variability of mammography practice. Therefore to improve quality of mammography screening and to comply with MQSA requirements, hospitals deploy a significant amount of resources to track radiologists performance over time in terms of predicting cancer. One of the core bottle necks is the manual review of pathology reports by experts. In this paper we successfully pilot an AI tool that automates this review. We take a two step approach that leverages the Universal Language Model Finetuning (ULMFiT) deep learning framework. First, we train an ULMFiT model that classify biopsy reports into Benign vs. Malignant with 97% precision and a recall of 96%. We fine-tune a second ULMFiT based model that further classifies Benign reports into Benign vs. Benign (High Risk Lesion) with a precision of 77% and recall of 92%. In the process of deploying, the model corrected 24 previously mislabeled reports. Finally we present a pilot deployment study that demonstrates 94 % or higher agreement between human expert and corresponding AI model.