Using natural language processing to extract mammographic findings

OBJECTIVE Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports. MATERIALS AND METHODS The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotations and anatomical location annotation were associated to each NLP detected finding through association rules. After excluding negated, uncertain, and historical findings, affirmative mentions of detected findings were summarized. Confidence flags were developed to denote reports with highly confident NLP results and reports with possible NLP errors. A random sample of 100 reports was manually abstracted to evaluate the accuracy of the system. RESULTS The NLP system correctly coded 96-99 out of our sample of 100 reports depending on findings. Measures of sensitivity, specificity and negative predictive values exceeded 0.92 for all findings. Positive predictive values were relatively low for some findings due to their low prevalence. DISCUSSION Our NLP system was implemented entirely in SAS Base, which makes it portable and easy to implement. It performed reasonably well with multiple applications, such as using confidence flags as a filter to improve the efficiency of manual review. Refinements of library and association rules, and testing on more diverse samples may further improve its performance. CONCLUSION Our NLP system successfully extracts clinically useful information from mammography reports. Moreover, SAS is a feasible platform for implementing NLP algorithms.

[1]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[2]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[3]  Ramin Khorasani,et al.  Natural language processing for radiology (part 2). , 2011, Journal of the American College of Radiology : JACR.

[4]  Carol Friedman,et al.  Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports , 1997, AMIA.

[5]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[6]  D. Vanel The American College of Radiology (ACR) Breast Imaging and Reporting Data System (BI-RADS): a step towards a universal radiological language? , 2007, European journal of radiology.

[7]  Özlem Uzuner,et al.  Machine learning and rule-based approaches to assertion classification. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[8]  Thomas H. Payne,et al.  A text processing pipeline to extract recommendations from radiology reports , 2013, J. Biomed. Informatics.

[9]  Selen Bozkurt,et al.  Annotation for Information Extraction from Mammography Reports , 2013, ICIMTH.

[10]  Justin A. Strauss,et al.  Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm , 2012, J. Am. Medical Informatics Assoc..

[11]  Mireille J. M. Broeders,et al.  Breast cancer risk prediction model: a nomogram based on common mammographic screening findings , 2013, European Radiology.

[12]  Ashraf Farrag,et al.  Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports , 2017, Journal of patient safety.

[13]  Bethany Percha,et al.  Automatic classification of mammography reports by BI-RADS breast tissue composition class , 2012, J. Am. Medical Informatics Assoc..

[14]  K. Kerlikowske,et al.  Positive predictive value of specific mammographic findings according to reader and patient variables. , 2009, Radiology.