Automated Indexing of Mammography Reports Using Linear Least Squares Fit

Radiologists routinely document mammography results in free text dictations. In the last decade, there has been an increase in the volume of mammography performed in the U.S. As a result, The American College of Radiology has standardized the practice of screening mammography by introducing a controlled vocabulary and practice standards tracked by audits. Extracting data from these free text reports has become extremely important in processing and tracking patient information. This paper discusses a method for automated extraction of accepted terms from free text reports. The Breast Imaging Reporting And Data System lexicon (BI-RADS) defines a hierarchy of terms to describe findings in mammograms. We use the Linear Least Squares Fit (LLSF) mapping algorithm to classify radiology reports into appropriate BI-RADS terms. Our system demonstrates a reasonable processing time. Its performance has been tested at different thresholds, to maximize precision or recall, which are inversely related. The threshold at which the maximum exact matches were achieved between our system and the gold-standard had an average precision of 83.4% 5.3% and an