ICB-UMA at CLEF e-Health 2020 Task 1: Automatic ICD-10 coding in Spanish with BERT

. This working notes paper presents our contribution to the CLEF eHealth 2020 Task 1. Our team has participated in the CodiEsp-D subtask, the first shared task consisted in the automatic clinical coding of medical cases in Spanish, annotated with ICD-10-CM codes. We tackled the task as a multi-label classification problem using BERT model [4]. With the aim of leveraging all the language modeling capacities of the deep bidirectional encoder architecture of BERT, we developed a tailored approach to annotate short fragments of text extracted from the long clinical cases present in the CodiEsp corpus and use them as input to the model. Two publicly available Spanish versions of BERT, namely BETO [3] and BERT-SciELO [1], were fine-tuned on the CodiEsp-D corpus extended by a set of abstracts annotated with ICD-10 codes, following our fragment-based classification approach. BERT-SciELO, a BERT-Base model pre-trained from scratch on an unlabeled corpus of biomedical articles in Spanish, achieved the best results among our three submitted systems, obtaining a final Mean Average Precision (MAP) metric score of 0.482 on the evaluation set.

[1]  K. Verspoor,et al.  Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives , 2020, IEEE Access.

[2]  Jes'us Villalba,et al.  Hierarchical Transformers for Long Document Classification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[3]  H. Suominen,et al.  Overview of the CLEF eHealth Evaluation Lab 2019 , 2019, CLEF.

[4]  Jianfeng Gao,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[5]  Yi Pan,et al.  Automated ICD-9 Coding via A Deep Learning Approach , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[7]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[8]  Jingqi Wang,et al.  Enhancing Clinical Concept Extraction with Contextual Embedding , 2019, J. Am. Medical Informatics Assoc..

[9]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[10]  Pierre Zweigenbaum,et al.  CLEF eHealth 2018 Multilingual Information Extraction Task Overview: ICD10 Coding of Death Certificates in French, Hungarian and Italian , 2018, CLEF.

[11]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[14]  K. Bretonnel Cohen,et al.  Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016 , 2016, CLEF.

[15]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[16]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[17]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[18]  Serguei V. S. Pakhomov,et al.  Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[19]  Aitor Gonzalez-Agirre,et al.  Overview of Automatic Clinical Coding: Annotations, Guidelines, and Solutions for non-English Clinical Cases at CodiEsp Track of CLEF eHealth 2020 , 2020, CLEF.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Ulf Leser,et al.  Classifying German Animal Experiment Summaries with Multi-lingual BERT at CLEF eHealth 2019 Task 1 , 2019, CLEF.

[22]  K. Bretonnel Cohen,et al.  CLEF eHealth 2017 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in English and French , 2017, CLEF.