BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the art performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.

[1]  Josh Errickson,et al.  Medication Accuracy in Electronic Health Records for Microbial Keratitis. , 2019, JAMA ophthalmology.

[2]  Elena Tutubalina,et al.  Deep Learning for ICD Coding: Looking for Medical Concepts in Clinical Documents in English and in French , 2018, CLEF.

[3]  Jimeng Sun,et al.  Clinical Concept Extraction for Document-Level Coding , 2019, BioNLP@ACL.

[4]  Pengtao Xie,et al.  A Neural Architecture for Automated ICD Coding , 2017, ACL.

[5]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[6]  Noémie Elhadad,et al.  Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[7]  Hongfang Liu,et al.  CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines , 2017, J. Am. Medical Informatics Assoc..

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Sheikh Shams Azam,et al.  CASCADENET: An LSTM Based Deep Learning Model for Automated ICD-10 Coding , 2019, Lecture Notes in Networks and Systems.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[12]  Shwu-chong Wu,et al.  The prevalence of chronic conditions and medical expenditures of the elderly by chronic condition indicator (CCI). , 2011, Archives of gerontology and geriatrics.

[13]  Ulf Leser,et al.  Classifying German Animal Experiment Summaries with Multi-lingual BERT at CLEF eHealth 2019 Task 1 , 2019, CLEF.

[14]  Julien Velcin,et al.  ECSTRA-INSERM @ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates , 2016, CLEF.

[15]  Fleur Fritz,et al.  Electronic health records to facilitate clinical research , 2016, Clinical Research in Cardiology.

[16]  Zina M. Ibrahim,et al.  SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research , 2017, bioRxiv.

[17]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[18]  Hiroshi Mamitsuka,et al.  AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks , 2018, ArXiv.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Rémi Flicoteaux ECSTRA-APHP @ CLEF eHealth2018-task 1: ICD10 Code Extraction from Death Certificates , 2018, CLEF.

[21]  Peter Szolovits,et al.  Automated de-identification of free-text medical records , 2008, BMC Medical Informatics Decis. Mak..

[22]  Jimeng Sun,et al.  Pre-training of Graph Augmented Transformers for Medication Recommendation , 2019, IJCAI.

[23]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Zihan Zhang,et al.  AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification , 2019, NeurIPS.

[27]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[28]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[29]  Bernard Friedman,et al.  Hospital Inpatient Costs for Adults with Multiple Chronic Conditions , 2006, Medical care research and review : MCRR.

[30]  Günter Neumann,et al.  MLT-DFKI at CLEF eHealth 2019: Multi-label Classification of ICD-10 Codes with BERT , 2019, CLEF.

[31]  James Henderson,et al.  GILE: A Generalized Input-Label Embedding for Text Classification , 2018, TACL.

[32]  Jingqi Wang,et al.  Enhancing Clinical Concept Extraction with Contextual Embedding , 2019, J. Am. Medical Informatics Assoc..

[33]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[34]  Rajesh Ranganath,et al.  ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission , 2019, ArXiv.

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[37]  Pengtao Xie,et al.  Multimodal Machine Learning for Automated ICD Coding , 2018, MLHC.

[38]  Pengtao Xie,et al.  A Neural Architecture for Automated ICD Coding , 2018, ACL.