Pneumonia identification using statistical feature selection

OBJECTIVE This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia. DESIGN A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification. RESULTS Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively. CONCLUSION Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.

[1]  Jianfeng Gao,et al.  MSR SPLAT, a language analysis toolkit , 2012, HLT-NAACL.

[2]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[3]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[4]  Peter J. Haug,et al.  A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia , 2001, J. Biomed. Informatics.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  Carol Friedman,et al.  Extracting Information on Pneumonia in Infants Using Natural Language Processing of Radiology Reports , 2003, BioNLP@ACL.

[7]  Dean F. Sittig,et al.  Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. , 2005, American journal of preventive medicine.

[8]  S. Trent Rosenbloom,et al.  NLP-based Identification of Pneumonia Cases from Free-Text Radiological Reports , 2008, AMIA.

[9]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[10]  Peter J. Haug,et al.  Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation , 2006, J. Biomed. Informatics.

[11]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[12]  M. N. Lutfiyya,et al.  Diagnosis and treatment of community-acquired pneumonia. , 2006, American family physician.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Peter J. Haug,et al.  Combining decision support methodologies to diagnose pneumonia , 2001, AMIA.

[15]  Peter J. Haug,et al.  Comparing expert systems for identifying chest x-ray reports that support pneumonia , 1999, AMIA.

[16]  Dunja Mladenic,et al.  Feature selection on hierarchy of web documents , 2003, Decis. Support Syst..

[17]  Fei Xia,et al.  Identifying Patients with Pneumonia from Free-Text Intensive Care Unit Reports , 2011 .

[18]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[19]  Wenqian Shang,et al.  A novel feature selection algorithm for text categorization , 2007, Expert Syst. Appl..

[20]  Eneida A. Mendonça,et al.  Use of computerized surveillance to detect nosocomial pneumonia in neonatal intensive care unit patients. , 2004, American journal of infection control.

[21]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[22]  C. Goss,et al.  Genetic variation in the FAS gene and associations with acute lung injury. , 2011, American journal of respiratory and critical care medicine.

[23]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[24]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[25]  Peter J. Haug,et al.  Automatic identification of pneumonia related concepts on chest x-ray reports , 1999, AMIA.

[26]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[27]  J. Bartlett,et al.  Infectious Diseases Society of America/American Thoracic Society Consensus Guidelines on the Management of Community-Acquired Pneumonia in Adults , 2007, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.