论文信息 - Machine Learning Based on Natural Language Processing to Detect Cardiac Failure in Clinical Narratives

Machine Learning Based on Natural Language Processing to Detect Cardiac Failure in Clinical Narratives

The purpose of the study presented herein is to develop a machine learning algorithm based on natural language processing that automatically detects whether a patient has a “cardiac failure” or “healthy” condition by using physician notes in Research Data Warehouse at CHU Sainte-Justine Hospital. First, a word representation learning technique was employed by using bag-of-word (BoW), term frequency–inverse document frequency (TF-IDF), and neural word embeddings (word2vec). Each representation technique aims to retain the words’ semantic and syntactic analysis in critical care data. It helps to enrich the mutual information for the word representation and leads to an advantage for further appropriate analysis steps. Second, a machine learning classifier was used to detect the patient’s condition for either cardiac failure or stable patient through the created word representation vector space from the previous step. This machine learning approach is based on a supervised binary classification algorithm, including logistic regression (LR), Gaussian Naive-Bayes (GaussianNB), and multilayer perceptron neural network (MLP-NN). Technically, it mainly optimizes the empirical loss during training the classifiers. As a result, an automatic learning algorithm would be accomplished to draw a high classification performance, including accuracy (acc), precision (pre), recall (rec), and F1-score (f1). The results show that the combination of TF-IDF and MLP-NN always outperformed other combinations with all overall performance. In the case without any feature selection, the proposed framework yielded an overall classification performance with acc, pre, rec, and f1 of 84% and 82%, 85%, and 83%, respectively. Significantly, if the feature selection was well applied, the overall performance would finally improve up to 4% for each evaluation.

[1] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[2] Pierre Zweigenbaum,et al. Clinical Natural Language Processing in languages other than English: opportunities and challenges , 2018, Journal of Biomedical Semantics.

[3] Shamim Nemati,et al. Machine Learning and Decision Support in Critical Care , 2016, Proceedings of the IEEE.

[4] Haipeng Shen,et al. Artificial intelligence in healthcare: past, present and future , 2017, Stroke and Vascular Neurology.

[5] Priyesh A. Patel,et al. Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure. , 2020, American heart journal.

[6] Gerard Salton,et al. On the Specification of Term Values in Automatic Indexing , 1973 .

[7] Abhishek Pandey,et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[8] Neil D. Lawrence,et al. Challenges in Deploying Machine Learning: A Survey of Case Studies , 2020, ACM Comput. Surv..

[9] Rong Jin,et al. Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[10] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12] Vladik Kreinovich,et al. A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation) , 2017, Int. J. Gen. Syst..

[13] F. Brunet,et al. Electronic Medical Record in Pediatric Intensive Care: Implementation Process Assessment , 2015, Journal of Pediatric Intensive Care.

[14] Clement J. McDonald,et al. Research and applications: Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis , 2014, J. Am. Medical Informatics Assoc..

[15] Martin T. Hagan,et al. Neural network design , 1995 .

[16] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[17] Fabio Rinaldi,et al. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review , 2019, JMIR medical informatics.

[18] W. Meurer,et al. Logistic Regression: Relating Patient Characteristics to Outcomes. , 2016, JAMA.

[19] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.