Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing

Abstract Background: Structured reports are not widely used and thus most reports exist in the form of free text. The process of data extraction by experts is time-consuming and error-prone, whereas data extraction by natural language processing (NLP) is a potential solution that could improve diagnosis efficiency and accuracy. The purpose of this study was to evaluate an NLP program that determines American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) descriptors and final assessment categories from breast magnetic resonance imaging (MRI) reports. Methods: This cross-sectional study involved 2330 breast MRI reports in the electronic medical record from 2009 to 2017. We used 1635 reports for the creation of a revised BI-RADS MRI lexicon and synonyms lists as well as the iterative development of an NLP system. The remaining 695 reports that were not used for developing the system were used as an independent test set for the final evaluation of the NLP system. The recall and precision of an NLP algorithm to detect the revised BI-RADS MRI descriptors and BI-RADS categories from the free-text reports were evaluated against a standard reference of manual human review. Results: There was a high level of agreement between two manual reviewers, with a κ value of 0.95. For all breast imaging reports, the NLP algorithm demonstrated a recall of 78.5% and a precision of 86.1% for correct identification of the revised BI-RADS MRI descriptors and the BI-RADS categories. NLP generated the total results in <1 s, whereas the manual reviewers averaged 3.38 and 3.23 min per report, respectively. Conclusions: The NLP algorithm demonstrates high recall and precision for information extraction from free-text reports. This approach will help to narrow the gap between unstructured report text and structured data, which is needed in decision support and other applications.

[1]  Ramin Khorasani,et al.  Automated Extraction of BI-RADS Final Assessment Categories from Radiology Reports with Natural Language Processing , 2013, Journal of Digital Imaging.

[2]  Loes M. M. Braun,et al.  Natural Language Processing in Radiology: A Systematic Review. , 2016, Radiology.

[3]  D. Schultz,et al.  Effect of breast magnetic resonance imaging on the clinical management of women with early-stage breast carcinoma. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  L. Esserman,et al.  Utility of magnetic resonance imaging in the management of breast cancer: evidence for improved preoperative staging. , 1999, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  Cecilia L Mercado,et al.  BI-RADS update. , 2014, Radiologic clinics of North America.

[6]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[7]  James H Thrall,et al.  Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1 , 2004 .

[8]  B. Porter,et al.  Contrast-enhanced breast magnetic resonance imaging: the surgical perspective. , 2007, American journal of surgery.

[9]  Pierre Andrews,et al.  Sense induction in folksonomies , 2011 .

[10]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[11]  Takeo Ishigaki,et al.  The role of contrast-enhanced MR mammography for determining candidates for breast conservation surgery , 2002, Breast cancer.

[12]  Nola Hylton,et al.  Magnetic resonance imaging of the breast: opportunities to improve breast cancer management. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[13]  E. Burnside,et al.  The ACR BI-RADS experience: learning from history. , 2009, Journal of the American College of Radiology : JACR.

[14]  P. Shekelle,et al.  Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care , 2006, Annals of Internal Medicine.

[15]  Guergana K. Savova,et al.  Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing , 2009, Journal of Digital Imaging.

[16]  Dimitrios Mitsouras,et al.  Natural Language Processing Technologies in Radiology Research and Clinical Applications. , 2016, Radiographics : a review publication of the Radiological Society of North America, Inc.

[17]  Ramin Khorasani,et al.  Repeat abdominal imaging examinations in a tertiary care hospital. , 2012, The American journal of medicine.

[18]  Rob C. van Ommering,et al.  Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE , 2012, Journal of Digital Imaging.

[19]  Carol Friedman,et al.  Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports , 1997, AMIA.

[20]  Isabelle Bedrosian,et al.  Changes in the surgical management of patients with breast carcinoma based on preoperative magnetic resonance imaging , 2003, Cancer.

[21]  A. D. De Schepper,et al.  Contrast-enhanced MR imaging of breast lesions and effect on treatment. , 2004, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.