论文信息 - Use of Semantic Features to Classify Patient Smoking Status

Use of Semantic Features to Classify Patient Smoking Status

The recent i2b2 NLP Challenge smoking classification task offers a rare chance to compare different natural language processing techniques on actual clinical data. We compare the performance of a classifier which relies on semantic features generated by an unmodified version of MedLEE, a clinical NLP engine, to one using lexical features. We also compare the performance of supervised classifiers to rule-based symbolic classifiers. Our baseline supervised classifier with lexical features yields a microaveraged F-measure of 0.81. Our rule-based classifier using MedLEE semantic features is superior, with an F-measure of 0.83. Our supervised classifier trained with semantic MedLEE features is competitive with the top-performing smoking classifier in the i2b2 NLP Challenge, with microaveraged precision of 0.90, recall of 0.89, and F-measure of 0.89.

Peter D. Stetson | Noémie Elhadad | Patrick J. McCormick

[1] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2] Yuan Luo,et al. Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[3] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[4] Matthew R. Sydes,et al. Technical Brief: Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries , 2008, J. Am. Medical Informatics Assoc..

[5] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[6] Christopher G. Chute,et al. Technical Brief: Mayo Clinic NLP System for Patient Smoking Status Identification , 2008, J. Am. Medical Informatics Assoc..

[7] Aaron M. Cohen,et al. Case Report: Five-way Smoking Status Classification Using Text Hot-Spot Identification and Error-correcting Output Codes , 2008, J. Am. Medical Informatics Assoc..

[8] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[9] Carol Friedman,et al. Extracting Phenotypic Information from the Literature via Natural Language Processing , 2004, MedInfo.

[10] K. Ohe,et al. Patient Status Classification by using Rule based Sentence Extraction and BM 25-kNN based Classifier , 2006 .

[11] Carol Friedman,et al. Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[12] Brian Wilson,et al. Case Report: Identifying Smokers with a Medical Extraction System , 2008, J. Am. Medical Informatics Assoc..

[13] J. Csirik,et al. Automatic extraction of semantic content from medical discharge records , 2006 .

[14] Scott Boag,et al. XQuery 1.0 : An XML Query Language , 2007 .

[15] State-specific prevalence of current cigarette smoking among adults and secondhand smoke rules and policies in homes and workplaces--United States, 2005. , 2006, MMWR. Morbidity and mortality weekly report.

[16] George Hripcsak,et al. Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[17] Peter D. Stetson,et al. Model Formulation: An Electronic Health Record Based on Structured Narrative , 2008, J. Am. Medical Informatics Assoc..