Natural language processing in pathology: a scoping review

Background Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. Objective We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Materials and methods Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. Results The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Conclusions Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods.

[1]  G W Moore,et al.  Automatic SNOMED coding. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[2]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[3]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[4]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[5]  A Hasman,et al.  Automatic SNOMED classification--a corpus-based method. , 1997, Computer methods and programs in biomedicine.

[6]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[7]  D. Moore,et al.  Classification of Cancer Stage from Free-text Histology Reports , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[9]  Corrine E. Munoz-Plaza,et al.  Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization , 2017, Journal of pathology informatics.

[10]  Michael Feldman,et al.  caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research , 2010, J. Am. Medical Informatics Assoc..

[11]  Wendy W. Chapman,et al.  Automating Tissue Bank Annotation from Pathology Reports - Comparison to a Gold Standard Expert Annotation Set , 2005, AMIA.

[12]  Richard W. Grant,et al.  Case Report: Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes , 2006, J. Am. Medical Informatics Assoc..

[13]  Jon Patrick,et al.  Automatic population of structured reports from narrative pathology reports , 2014 .

[14]  Jon Patrick,et al.  Automatic negation detection in narrative pathology reports , 2015, Artif. Intell. Medicine.

[15]  Justin A. Strauss,et al.  Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm , 2012, J. Am. Medical Informatics Assoc..

[16]  A Hasman,et al.  Automatic Coding of Diagnostic Reports , 1998, Methods of Information in Medicine.

[17]  Ulysses J. Balis,et al.  Development and evaluation of an open source software tool for deidentification of pathology reports , 2006, BMC Medical Informatics Decis. Mak..

[18]  H. K. Haugland,et al.  Implementation and use of electronic synoptic cancer reporting: an explorative case study of six Norwegian pathology laboratories , 2014, Implementation Science.

[19]  Yue Li,et al.  Information Extraction of Multiple Categories from Pathology Reports , 2010, ALTA.

[20]  Carol Friedman,et al.  Facilitating Cancer Research using Natural Language Processing of Pathology Reports , 2004, MedInfo.

[21]  Fernanda Polubriaginof,et al.  The feasibility of using natural language processing to extract clinical information from breast pathology reports , 2012, Journal of pathology informatics.

[22]  Yorick Wilks,et al.  GATE: an environment to support research and development in natural language engineering , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.

[23]  C. Cairns,et al.  Computer-facilitated review of electronic medical records reliably identifies emergency department interventions in older adults. , 2013, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[24]  Chengyi Zheng,et al.  Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results , 2013, World Journal of Urology.

[25]  A Burgun,et al.  Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer , 2011, Methods of Information in Medicine.

[26]  Carlos Martínez,et al.  The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records , 2012, BMC Medical Informatics and Decision Making.

[27]  Polun Chang,et al.  Developing and Evaluating a Simple, Spreadsheet-based Pathology Report Extraction System for Cancer Registrars , 2006, AMIA.

[28]  Marc Cuggia,et al.  Design of an automatic coding algorithm for a multi-axial classification in pathology , 2008, MIE.

[29]  Leonard W. D'Avolio,et al.  Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC) , 2010, J. Am. Medical Informatics Assoc..

[30]  Anthony N. Nguyen,et al.  Classification of pathology reports for cancer registry notifications , 2012, HIC.

[31]  Yue Li,et al.  Information extraction from pathology reports in a hospital setting , 2011, CIKM '11.

[32]  Clement J. McDonald,et al.  A successful technique for removing names in pathology reports using an augmented search and replace method , 2002, AMIA.

[33]  N. Dalkey,et al.  An Experimental Application of the Delphi Method to the Use of Experts , 1963 .

[34]  Clement J. McDonald,et al.  Extracting Structured Information from Free Text Pathology Reports , 2003, AMIA.

[35]  Hongfang Liu,et al.  Clinical decision support with automated text processing for cervical cancer screening , 2012, J. Am. Medical Informatics Assoc..

[36]  Scott R Owens,et al.  Application of a Rules-based Natural Language Parser to Critical Value Reporting in Anatomic Pathology , 2012, The American journal of surgical pathology.

[37]  Chengyi Zheng,et al.  Second Prize: A Natural Language Processing Program Effectively Extracts Key Pathologic Findings from Radical Prostatectomy Reports , 2014 .

[38]  C. Johnson,et al.  Validation of claims data algorithms to identify nonmelanoma skin cancer , 2012, The Journal of investigative dermatology.

[39]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[40]  Timothy D. Imler,et al.  Natural language processing accurately categorizes findings from colonoscopy and pathology reports. , 2013, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[41]  Jules J Berman,et al.  Implementation and evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports. , 2004, Studies in health technology and informatics.

[42]  Timothy Baldwin,et al.  Detecting modification of biomedical events using a deep parsing approach , 2012, BMC Medical Informatics and Decision Making.

[43]  Lisa Dahm,et al.  University of California, Irvine–Pathology Extraction Pipeline: The pathology extraction pipeline for information extraction from pathology reports , 2014, Health Informatics J..

[44]  A. Nguyen,et al.  Multi-class Classification of Cancer Stages from Free-text Histology Reports using Support Vector Machines , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[45]  Wendy W. Chapman,et al.  Developing a natural language processing application for measuring the quality of colonoscopy procedures , 2011, J. Am. Medical Informatics Assoc..

[46]  Anthony N. Nguyen,et al.  Application of Information Technology: Collection of Cancer Stage Data by Classifying Free-text Medical Reports , 2007, J. Am. Medical Informatics Assoc..

[47]  Frank Castro,et al.  Implementation Brief: Quality Assurance in Anatomic Pathology: Automated SNOMED Coding , 1996, J. Am. Medical Informatics Assoc..

[48]  Peter J. Richardson,et al.  Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing , 2016, Medical care.

[49]  J. Srigley,et al.  Standardized synoptic cancer pathology reports - so what and who cares? A population-based satisfaction survey of 970 pathologists, surgeons, and oncologists. , 2013, Archives of pathology & laboratory medicine.

[50]  Yang Huang,et al.  Using a Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon to Assign SNOMED CT Codes to Anatomic Sites and Pathologic Diagnoses in Full Text Pathology Reports , 2009, AMIA.

[51]  David A Hanauer,et al.  The registry case finding engine: an automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes. , 2007, Journal of the American College of Surgeons.

[52]  Peter Szolovits,et al.  Automatic lymphoma classi fi cation with sentence subgraph mining from pathology reports , 2014 .

[53]  Elena Paslaru Bontas Simperl,et al.  Feeding OWL: Extracting and Representing the Content of Pathology Reports , 2004, NLPXML@ACL.

[54]  Leonard W. D'Avolio,et al.  Automated Identification of Surveillance Colonoscopy in Inflammatory Bowel Disease Using Natural Language Processing , 2013, Digestive Diseases and Sciences.

[55]  Arnold W. Pratt,et al.  Automatic indexing of pathology data , 1978, J. Am. Soc. Inf. Sci..

[56]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[57]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[58]  Richard Wootton,et al.  Adoption of telemedicine: from pilot stage to routine delivery , 2012, BMC Medical Informatics and Decision Making.

[59]  Anthony N. Nguyen,et al.  Symbolic rule-based classification of lung cancer stages from free-text pathology reports , 2010, J. Am. Medical Informatics Assoc..