An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes

BACKGROUND AND OBJECTIVE Code assignment is of paramount importance in many levels in modern hospitals, from ensuring accurate billing process to creating a valid record of patient care history. However, the coding process is tedious and subjective, and it requires medical coders with extensive training. This study aims to evaluate the performance of deep-learning-based systems to automatically map clinical notes to ICD-9 medical codes. METHODS The evaluations of this research are focused on end-to-end learning methods without manually defined rules. Traditional machine learning algorithms, as well as state-of-the-art deep learning methods such as Recurrent Neural Networks and Convolution Neural Networks, were applied to the Medical Information Mart for Intensive Care (MIMIC-III) dataset. An extensive number of experiments was applied to different settings of the tested algorithm. RESULTS Findings showed that the deep learning-based methods outperformed other conventional machine learning methods. From our assessment, the best models could predict the top 10 ICD-9 codes with 0.6957 F1 and 0.8967 accuracy and could estimate the top 10 ICD-9 categories with 0.7233 F1 and 0.8588 accuracy. Our implementation also outperformed existing work under certain evaluation metrics. CONCLUSION A set of standard metrics was utilized in assessing the performance of ICD-9 code assignment on MIMIC-III dataset. All the developed evaluation tools and resources are available online, which can be used as a baseline for further research.

[1]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[3]  Özlem Uzuner,et al.  Three Approaches to Automatic Assignment of ICD-9-CM Codes to Radiology Reports , 2007, AMIA.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Cédrick Fairon,et al.  Machine learning and features selection for semi-automatic ICD-9-CM encoding , 2010, Louhi@NAACL-HLT.

[8]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[9]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[10]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[11]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[16]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[17]  Karin M. Verspoor,et al.  Big Data in Medicine Is Driving Big Changes , 2014, Yearbook of Medical Informatics.

[18]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Yi Pan,et al.  Automatic ICD-9 coding via deep transfer learning , 2019, Neurocomputing.

[20]  Yuan Luo,et al.  Recurrent Neural Networks for Classifying Relations in Clinical Notes , 2017, AMIA.

[21]  Berthier A. Ribeiro-Neto,et al.  An experimental study in automatically categorizing medical documents , 2001, J. Assoc. Inf. Sci. Technol..

[22]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[23]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[24]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[25]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[26]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[27]  Henrique M. G. Martins,et al.  Using Structured EHR Data and SVM to Support ICD-9-CM Coding , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[28]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[29]  W. Bruce Croft,et al.  Automatic Assignment of ICD9 Codes To Discharge Summaries , 1995 .

[30]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[31]  Oladimeji Farri,et al.  Condensed Memory Networks for Clinical Diagnostic Inferencing , 2016, AAAI.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[34]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[35]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[36]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[37]  Yuan Ling Methods and Techniques for Clinical Text Modeling and Analytics , 2017 .

[38]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[41]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[42]  C. Anandan,et al.  The Impact of eHealth on the Quality and Safety of Health Care: A Systematic Overview , 2011, PLoS medicine.