论文信息 - Automated ICD-9-CM medical coding of diabetic patient's clinical reports

Automated ICD-9-CM medical coding of diabetic patient's clinical reports

The assignment of ICD-9-CM codes to patient's clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report's text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Although fully automated coding is not achievable, these results suggest that automated classification could be used to aid clinical staff by selecting the most probable codes.

[1] Jeffrey Dean,et al. Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[2] John F. Hurdle,et al. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[3] Noémie Elhadad,et al. Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[4] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Patrick Ruch,et al. Comparing corpora and lexical ambiguity , 2000 .

[8] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[9] Hongfang Liu,et al. A study of abbreviations in the UMLS , 2001, AMIA.

[10] Ted Pedersen,et al. Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[11] Richárd Farkas,et al. Automatic construction of rule-based ICD-9-CM coding systems , 2008, BMC Bioinformatics.

[12] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[13] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14] Sandeep Ayyar,et al. Tagging Patient Notes With ICD-9 Codes , 2017 .