Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm

In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams.

[1]  P J Haug,et al.  Quantifying the characteristics of unambiguous chest radiography reports in the context of pneumonia. , 2001, Academic radiology.

[2]  Peter J. Haug,et al.  MPLUS: a probabilistic medical language understanding system , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[3]  Hua Xu,et al.  Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records , 2009, AMIA.

[4]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[5]  Yang Huang,et al.  A Grammar-based Classification of Negations in Clinical Radiology Reports , 2005, AMIA.

[6]  Özlem Uzuner,et al.  Machine learning and rule-based approaches to assertion classification. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[7]  Rathachai Kaewlai,et al.  Computed tomography pulmonary angiography: an assessment of the radiology report. , 2009, Academic radiology.

[8]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[9]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[10]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[11]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[12]  L. Berlin,et al.  Pitfalls of the vague radiology report. , 2000, AJR. American journal of roentgenology.

[13]  Jonathan M. Teich,et al.  Research Paper: Identifying Adverse Drug Events: Development of a Computer-based Monitor and Comparison with Chart Review and Stimulated Voluntary Report , 1998, J. Am. Medical Informatics Assoc..

[14]  E. Fisher,et al.  The accuracy of Medicare's hospital claims data: progress has been made, but problems remain. , 1992, American journal of public health.

[15]  Zelalem Temesgen,et al.  Using natural language processing for identification of pneumonia cases from clinical records of patients with serologically proven influenza. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  V. Tapson,et al.  The Evaluation of Suspected Pulmonary Embolism , 2003 .

[17]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[18]  Gregory F Cooper,et al.  Research Paper: Creating a Text Classifier to Detect Radiology Reports Describing Mediastinal Findings Associated with Inhalational Anthrax and Other Disorders , 2003, J. Am. Medical Informatics Assoc..

[19]  Hiroto Hatabu,et al.  The cost and consequence of "uncertainty". , 2009, Academic radiology.

[20]  V. Tapson,et al.  Clinical practice. The evaluation of suspected pulmonary embolism. , 2003, The New England journal of medicine.

[21]  Wayne H. Ward,et al.  Towards Temporal Relation Discovery from the Clinical Narrative , 2009, AMIA.

[22]  Thomas Lumley,et al.  Diabetes Mellitus, Glycemic Control, and Risk of Atrial Fibrillation , 2010, Journal of General Internal Medicine.

[23]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[24]  John K. Vries,et al.  The medical archival system: An information retrieval system based on distributed parallel processing , 1991, Inf. Process. Manag..

[25]  George Hripcsak,et al.  A temporal constraint structure for extracting temporal information from clinical narrative , 2006, J. Biomed. Informatics.

[26]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[27]  Wendy W. Chapman,et al.  Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports , 2010, Journal of pathology informatics.

[28]  Mary F. Wisniewski,et al.  Electronic Interpretation of Chest Radiograph Reports to Detect Central Venous Catheters , 2003, Infection Control & Hospital Epidemiology.

[29]  John Mullooly,et al.  Impact of the introduction of pneumococcal conjugate vaccine on immunization coverage among infants , 2005, BMC pediatrics.

[30]  Onchee Yu,et al.  Impact of the introduction of pneumococcal conjugate vaccine on rates of community acquired pneumonia in children and adults. , 2008, Vaccine.

[31]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[32]  Danielle L. Mowery,et al.  Distinguishing Historical from Current Problems in Clinical Reports – Which Textual Features Help? , 2009, BioNLP@HLT-NAACL.

[33]  Peter J. Haug,et al.  ONYX: A System for the Semantic Analysis of Clinical Text , 2009, BioNLP@HLT-NAACL.

[34]  Bruce G. Buchanan,et al.  Identifying patient subgroups with simple Bayes' , 1999, AMIA.

[35]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[36]  J. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004, American journal of clinical pathology.

[37]  Ann M Geiger,et al.  An automated data algorithm to distinguish screening and diagnostic colorectal cancer endoscopy exams. , 2005, Journal of the National Cancer Institute. Monographs.

[38]  Serguei V. S. Pakhomov,et al.  Quality Performance Measurement Using the Text of Electronic Medical Records , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[39]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[40]  J. A. Barker Costs of adverse events in intensive care units , 2009 .

[41]  Peter J. Haug,et al.  Classifying free-text triage chief complaints into syndromic categories with natural language processing , 2005, Artif. Intell. Medicine.