Validation of natural language processing to extract breast cancer pathology procedures and results

Background: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. Methods: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. Results: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. Conclusions: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance.

[1]  K. Kerlikowske,et al.  Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. , 1997, AJR. American journal of roentgenology.

[2]  Emily White,et al.  Use of the American College of Radiology BI-RADS guidelines by community radiologists: concordance of assessments and recommendations assigned to screening mammograms. , 2002, AJR. American journal of roentgenology.

[3]  Timothy D. Imler,et al.  Natural language processing accurately categorizes findings from colonoscopy and pathology reports. , 2013, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[4]  Karla Kerlikowske,et al.  Pathologic findings from the Breast Cancer Surveillance Consortium , 2006, Cancer.

[5]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[6]  Steven H. Brown,et al.  Exploring the Frontier of Electronic Health Record Surveillance: The Case of Postoperative Complications , 2013, Medical care.

[7]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[8]  Chengyi Zheng,et al.  Second Prize: A Natural Language Processing Program Effectively Extracts Key Pathologic Findings from Radical Prostatectomy Reports , 2014 .

[9]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[10]  Cheng Zhang,et al.  Biomedical text mining and its applications in cancer research , 2013, J. Biomed. Informatics.

[11]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[12]  Chengyi Zheng,et al.  Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results , 2013, World Journal of Urology.

[13]  Karla Kerlikowske,et al.  Performance benchmarks for screening mammography. , 2006, Radiology.

[14]  Fernanda Polubriaginof,et al.  The feasibility of using natural language processing to extract clinical information from breast pathology reports , 2012, Journal of pathology informatics.

[15]  Wendy W. Chapman,et al.  Developing a natural language processing application for measuring the quality of colonoscopy procedures , 2011, J. Am. Medical Informatics Assoc..

[16]  S. Edge,et al.  Concordance with breast cancer pathology reporting practice guidelines. , 2003, Journal of the American College of Surgeons.

[17]  K. E. Ravikumar,et al.  Automated chart review for asthma cohort identification using natural language processing: an exploratory study. , 2013, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[18]  Donald L Weaver,et al.  Understanding diagnostic variability in breast pathology: lessons learned from an expert consensus review panel , 2014, Histopathology.

[19]  Scott R. Halgrim,et al.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. , 2014, American journal of epidemiology.

[20]  Karla Kerlikowske,et al.  Benign breast disease, mammographic breast density, and the risk of breast cancer. , 2013, Journal of the National Cancer Institute.

[21]  Henk Harkema,et al.  Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. , 2012, Gastrointestinal endoscopy.

[22]  Brian L Hazlehurst,et al.  Automating assessment of lifestyle counseling in electronic health records. , 2014, American journal of preventive medicine.