论文信息 - Using Deep Learning for Automatic Icd-10 Classification from Free-Text Data

Using Deep Learning for Automatic Icd-10 Classification from Free-Text Data

Background: Classifying diseases into ICD codes has mainly relied on human reading a large amount of written materials, such as discharge diagnoses, chief complaints, medical history, and operation records as the basis for classification. Coding is both laborious and time consuming because a disease coder with professional abilities takes about 20 minutes per case in average. Therefore, an automatic code classification system can significantly reduce the human effort. Objectives: This paper aims at constructing a machine learning model for ICD-10 coding, where the model is to automatically determine the corresponding diagnosis codes solely based on free-text medical notes. Methods: In this paper, we apply Natural Language Processing (NLP) and Recurrent Neural Network (RNN) architecture to classify ICD-10 codes from natural language texts with supervised learning. Results: In the experiments on large hospital data, our predicting result can reach F1-score of 0.62 on ICD-10-CM code. Conclusion: The developed model can significantly reduce manpower in coding time compared with a professional coder.

[1] Hongfang Liu,et al. MedSTS: a resource for clinical semantic textual similarity , 2018, Language Resources and Evaluation.

[2] Bhargav Srinivasa Desikan,et al. Natural Language Processing and Computational Linguistics , 2018 .

[3] Jimeng Sun,et al. LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity , 2017, KDD.

[4] Rich Caruana,et al. An empirical comparison of supervised learning algorithms , 2006, ICML.

[5] Daniel J. Feller,et al. Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment , 2017, Journal of acquired immune deficiency syndromes.

[6] Jeffrey Dean,et al. Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[7] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[8] Barbara J. Grosz,et al. Natural-Language Processing , 1982, Artificial Intelligence.

[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10] Noémie Elhadad,et al. Automated methods for the summarization of electronic health records , 2015, J. Am. Medical Informatics Assoc..

[11] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12] Richárd Farkas,et al. Automatic construction of rule-based ICD-9-CM coding systems , 2008, BMC Bioinformatics.

[13] Jürgen Schmidhuber,et al. LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[14] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.