Using Deep Learning for Automatic Icd-10 Classification from Free-Text Data

Background: Classifying diseases into ICD codes has mainly relied on human reading a large amount of written materials, such as discharge diagnoses, chief complaints, medical history, and operation records as the basis for classification. Coding is both laborious and time consuming because a disease coder with professional abilities takes about 20 minutes per case in average. Therefore, an automatic code classification system can significantly reduce the human effort. Objectives: This paper aims at constructing a machine learning model for ICD-10 coding, where the model is to automatically determine the corresponding diagnosis codes solely based on free-text medical notes. Methods: In this paper, we apply Natural Language Processing (NLP) and Recurrent Neural Network (RNN) architecture to classify ICD-10 codes from natural language texts with supervised learning. Results: In the experiments on large hospital data, our predicting result can reach F1-score of 0.62 on ICD-10-CM code. Conclusion: The developed model can significantly reduce manpower in coding time compared with a professional coder.

[1]  Hongfang Liu,et al.  MedSTS: a resource for clinical semantic textual similarity , 2018, Language Resources and Evaluation.

[2]  Bhargav Srinivasa Desikan,et al.  Natural Language Processing and Computational Linguistics , 2018 .

[3]  Jimeng Sun,et al.  LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity , 2017, KDD.

[4]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[5]  Daniel J. Feller,et al.  Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment , 2017, Journal of acquired immune deficiency syndromes.

[6]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[7]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[8]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Noémie Elhadad,et al.  Automated methods for the summarization of electronic health records , 2015, J. Am. Medical Informatics Assoc..

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Richárd Farkas,et al.  Automatic construction of rule-based ICD-9-CM coding systems , 2008, BMC Bioinformatics.

[13]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[14]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.