Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance using relatively small portions of text. We point at better methods to aggregate information from long texts as the main need for improving BERT-based ICD coding.

[1]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[2]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[3]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[6]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[7]  Berthier A. Ribeiro-Neto,et al.  A hierarchical approach to the automatic categorization of medical documents , 1998, CIKM '98.

[8]  Philip S. Yu,et al.  EHR Coding with Multi-scale Feature Attention and Structured Knowledge Graph Propagation , 2019, CIKM.

[9]  M. Saeed,et al.  Multiparameter Intelligent Monitoring in Intensive Care Ii (Mimic-Ii): A Public-Access Intensive Care Unit Database , 2011 .

[10]  Walter Daelemans,et al.  Selecting relevant features from the electronic health record for clinical code prediction , 2017, J. Biomed. Informatics.

[11]  Pengtao Xie,et al.  A Neural Architecture for Automated ICD Coding , 2017, ACL.

[12]  Jimeng Sun,et al.  Explainable Prediction of Medical Codes from Clinical Text , 2018, NAACL.

[13]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[14]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[15]  Graham Neubig,et al.  Learning to Deceive with Attention-Based Explanations , 2020, ACL.

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Willem Zuidema,et al.  Quantifying Attention Flow in Transformers , 2020, ACL.

[19]  Anthony N. Nguyen,et al.  A Label Attention Model for ICD Coding from Clinical Text , 2020, IJCAI.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[22]  Yubo Chen,et al.  HyperCore: Hyperbolic and Co-graph Representation for Automatic ICD Coding , 2020, ACL.

[23]  Pengtao Xie,et al.  Generalized Zero-Shot Text Classification for ICD Coding , 2020, IJCAI.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Roger Wattenhofer,et al.  Telling BERT’s Full Story: from Local Attention to Global Aggregation , 2020, EACL.

[29]  Andrew Y. Ng,et al.  Improving palliative care with deep learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[30]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[31]  Yang Liu,et al.  On Identifiability in Transformers , 2020, ICLR.

[32]  Oladimeji Farri,et al.  Condensed Memory Networks for Clinical Diagnostic Inferencing , 2016, AAAI.

[33]  Noémie Elhadad,et al.  Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[34]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Fei Li,et al.  ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network , 2019, AAAI.

[36]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[37]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[38]  Anthony N. Nguyen,et al.  Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods , 2017, BioNLP.

[39]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..