论文信息 - Extracting Smoking Status from Electronic Health Records Using NLP and Deep Learning. - 字舞流文

Extracting Smoking Status from Electronic Health Records Using NLP and Deep Learning.

Half a million people die every year from smoking-related issues across the United States. It is essential to identify individuals who are tobacco-dependent in order to implement preventive measures. In this study, we investigate the effectiveness of deep learning models to extract smoking status of patients from clinical progress notes. A Natural Language Processing (NLP) Pipeline was built that cleans the progress notes prior to processing by three deep neural networks: a CNN, a unidirectional LSTM, and a bidirectional LSTM. Each of these models was trained with a pre- trained or a post-trained word embedding layer. Three traditional machine learning models were also employed to compare against the neural networks. Each model has generated both binary and multi-class label classification. Our results showed that the CNN model with a pre-trained embedding layer performed the best for both binary and multi- class label classification.

Umit Topaloglu | Suraj Rajendran | U. Topaloglu | Suraj Rajendran

[1] Kenric W. Hammond,et al. Copying and pasting of examinations within the electronic medical record , 2007, Int. J. Medical Informatics.

[2] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3] Chen Lin,et al. Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records , 2013, AMIA.

[4] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[5] J. Henry,et al. Adoption of Electronic Health Record Systems among U . S . Non-Federal Acute Care Hospitals : 2008-2015 , 2013 .

[6] Pedro M. Domingos. A few useful things to know about machine learning , 2012, Commun. ACM.

[7] Abeed Sarker,et al. Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[8] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[9] H. Roncancio,et al. Ceiling analysis of pedestrian recognition pipeline for an autonomous car application , 2013, 2013 IEEE Workshop on Robot Vision (WORV).

[10] B. Lushniak,et al. The Health consequences of smoking—50 years of progress : a report of the Surgeon General , 2014 .

[11] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[12] J. Kazmierska,et al. Application of the Naïve Bayesian Classifier to optimize treatment decisions. , 2008, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[13] Sunghwan Sohn,et al. Mayo Clinic Smoking Status Classification System: Extensions and Improvements , 2009, AMIA.

[14] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[15] S. C. Kremer,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[16] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Isaac S. Kohane,et al. Sentiment Measured in Hospital Discharge Notes Is Associated with Readmission and Mortality Risk: An Electronic Health Record Study , 2015, PloS one.

[19] Walter F. Stewart,et al. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[20] Yen S. Low,et al. Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[21] I. Kohane,et al. Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts , 2015, PloS one.

[22] Girish Chavan,et al. NOBLE – Flexible concept recognition for large-scale biomedical natural language processing , 2016, BMC Bioinformatics.

[23] Xiaolong Wang,et al. Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries , 2015, Inf..

[24] Daniel J. Pallin,et al. Estimates of Electronic Medical Records in U.S. Emergency Departments , 2010, PloS one.

[25] Hongfang Liu,et al. Research and applications: Patient-level temporal aggregation for text-based asthma status ascertainment , 2014, J. Am. Medical Informatics Assoc..

[26] Xiaolong Wang,et al. Drug-Drug Interaction Extraction via Convolutional Neural Networks , 2016, Comput. Math. Methods Medicine.

[27] Xiaolong Wang,et al. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks , 2014, BioMed research international.

[28] S. Sathiya Keerthi,et al. Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[29] Joseph T. Lariscy. Smoking-attributable mortality by cause of death in the United States: An indirect approach , 2019, SSM - population health.

[30] Jimeng Sun,et al. Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records , 2014, Int. J. Medical Informatics.