Automatic abstraction of imaging observations with their characteristics from mammography reports

BACKGROUND Radiology reports are usually narrative, unstructured text, a format which hinders the ability to input report contents into decision support systems. In addition, reports often describe multiple lesions, and it is challenging to automatically extract information on each lesion and its relationships to characteristics, anatomic locations, and other information that describes it. The goal of our work is to develop natural language processing (NLP) methods to recognize each lesion in free-text mammography reports and to extract its corresponding relationships, producing a complete information frame for each lesion. MATERIALS AND METHODS We built an NLP information extraction pipeline in the General Architecture for Text Engineering (GATE) NLP toolkit. Sequential processing modules are executed, producing an output information frame required for a mammography decision support system. Each lesion described in the report is identified by linking it with its anatomic location in the breast. In order to evaluate our system, we selected 300 mammography reports from a hospital report database. RESULTS The gold standard contained 797 lesions, and our system detected 815 lesions (780 true positives, 35 false positives, and 17 false negatives). The precision of detecting all the imaging observations with their modifiers was 94.9, recall was 90.9, and the F measure was 92.8. CONCLUSIONS Our NLP system extracts each imaging observation and its characteristics from mammography reports. Although our application focuses on the domain of mammography, we believe our approach can generalize to other domains and may narrow the gap between unstructured clinical report text and structured information extraction needed for data mining and decision support.

[1]  Sung Hun Kim,et al.  Observer Agreement Using the ACR Breast Imaging Reporting and Data System (BI-RADS)-Ultrasound, First Edition (2003) , 2007, Korean journal of radiology.

[2]  Fei Xia,et al.  Automatic identification of critical follow-up recommendation sentences in radiology reports. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[3]  E. Burnside,et al.  The ACR BI-RADS experience: learning from history. , 2009, Journal of the American College of Radiology : JACR.

[4]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[5]  Peter J. Haug,et al.  Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation , 2006, J. Biomed. Informatics.

[6]  Guergana K. Savova,et al.  Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing , 2009, Journal of Digital Imaging.

[7]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[8]  Carol Friedman,et al.  Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports , 1997, AMIA.

[9]  L. Liberman,et al.  Breast imaging reporting and data system (BI-RADS). , 2002, Radiologic clinics of North America.

[10]  C. Langlotz RadLex: a new method for indexing online educational materials. , 2006, Radiographics : a review publication of the Radiological Society of North America, Inc.

[11]  B. Burnside,et al.  Automated Indexing of Mammography Reports Using Linear Least Squares Fit , 2000 .

[12]  Katherine P Andriole,et al.  Retrieval of Radiology Reports Citing Critical Findings with Disease-Specific Customization , 2012, The open medical informatics journal.

[13]  Małgorzata Marciniak,et al.  Rule-based information extraction from patients' clinical data , 2009, J. Biomed. Informatics.

[14]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[15]  Rob C. van Ommering,et al.  Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE , 2012, Journal of Digital Imaging.

[16]  Ricky K. Taira,et al.  Indexing Anatomical Phrases in Neuro-Radiology Reports to the UMLS 2005AA , 2005, AMIA.

[17]  Peter J. Haug,et al.  Comparing expert systems for identifying chest x-ray reports that support pneumonia , 1999, AMIA.

[18]  N L Jain,et al.  Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[19]  Ross D. Shachter,et al.  Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. , 2006, Radiology.

[20]  Abdul V. Roudsari,et al.  Lexical patterns, features and knowledge resources for coreference resolution in clinical notes , 2012, J. Biomed. Informatics.

[21]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[22]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.

[23]  P Taylor Decision Support for Image Interpretation: A Mammography Workstation , 1995 .

[24]  P. Langenberg,et al.  Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. , 2000, AJR. American journal of roentgenology.

[25]  Martin Romacker,et al.  MedSynDikate - a natural language system for the extraction of medical information from findings reports , 2002, Int. J. Medical Informatics.

[26]  Joshua C. Denny,et al.  The KnowledgeMap Project: Development of a Concept-Based Medical School Curriculum Database , 2003, AMIA.

[27]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[28]  G. Nenadic,et al.  Decision support systems for clinical radiological practice -- towards the next generation. , 2010, The British journal of radiology.

[29]  K. Doi,et al.  Potential of computer-aided diagnosis to reduce variability in radiologists' interpretations of mammograms depicting microcalcifications. , 2001, Radiology.

[30]  Andrea Esuli,et al.  An enhanced CRFs-based system for information extraction from radiology reports , 2013, J. Biomed. Informatics.

[31]  C. D. Page,et al.  Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. , 2009, Radiology.

[32]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[33]  Bao H. Do,et al.  Automatic Retrieval of Bone Fracture Knowledge Using Natural Language Processing , 2013, Journal of Digital Imaging.

[34]  R. Richesson,et al.  Clinical research informatics , 2012 .

[35]  Thomas H. Payne,et al.  A text processing pipeline to extract recommendations from radiology reports , 2013, J. Biomed. Informatics.

[36]  Brett R South,et al.  Natural language processing for lines and devices in portable chest x-rays. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[37]  Diana Maynard,et al.  Metrics for Evaluation of Ontology-based Information Extraction , 2006, EON@WWW.

[38]  David Page,et al.  Information Extraction for Clinical Data Mining: A Mammography Case Study , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[39]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[40]  Turgay Ayer,et al.  Artificial Neural Networks in Mammography Interpretation and Diagnostic Decision Making , 2013, Comput. Math. Methods Medicine.

[41]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[42]  Michael Feldman,et al.  caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research , 2010, J. Am. Medical Informatics Assoc..

[43]  K. Kerlikowske,et al.  Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. , 1998, Journal of the National Cancer Institute.

[44]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[45]  Y. Wu,et al.  Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. , 1993, Radiology.

[46]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[47]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[48]  Carol Friedman,et al.  Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine , 2013, J. Biomed. Informatics.

[49]  Hong Yu,et al.  Natural Language Processing, Electronic Health Records, and Clinical Research , 2012 .