Transfer learning applied to text classification in Spanish radiological reports

Pre-trained text encoders have rapidly advanced the state-of-the-art on many Natural Language Processing tasks. This paper presents the use of transfer learning methods applied to the automatic detection of codes in radiological reports in Spanish. Assigning codes to a clinical document is a popular task in NLP and in the biomedical domain. These codes can be of two types: standard classifications (e.g. ICD-10) or specific to each clinic or hospital. In this study we show a system using specific radiology clinic codes. The dataset is composed of 208,167 radiology reports labeled with 89 different codes. The corpus has been evaluated with three methods using the BERT model applied to Spanish: Multilingual BERT, BETO and XLM. The results are interesting obtaining 70% of F1-score with a pre-trained multilingual model.

[1]  Koby Crammer,et al.  Automatic Code Assignment to Medical Text , 2007, BioNLP@ACL.

[2]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[3]  Jimeng Sun,et al.  Explainable Prediction of Medical Codes from Clinical Text , 2018, NAACL.

[4]  Hyeong-Ah Choi,et al.  Automated outcome classification of emergency department computed tomography imaging reports. , 2013, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[5]  Mariana L. Neves,et al.  Overview of the CLEF eHealth 2019 Multilingual Information Extraction , 2019, CLEF.

[6]  Günter Neumann,et al.  MLT-DFKI at CLEF eHealth 2019: Multi-label Classification of ICD-10 Codes with BERT , 2019, CLEF.

[7]  Pierre Zweigenbaum,et al.  CLEF eHealth 2018 Multilingual Information Extraction Task Overview: ICD10 Coding of Death Certificates in French, Hungarian and Italian , 2018, CLEF.

[8]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[9]  Robert M. Nishikawa,et al.  A study on several Machine-learning methods for classification of Malignant and benign clustered microcalcifications , 2005, IEEE Transactions on Medical Imaging.

[10]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[11]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[12]  Lina Yao,et al.  Diagnosis Code Assignment Using Sparsity-Based Disease Correlation Embedding , 2016, IEEE Transactions on Knowledge and Data Engineering.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  K. Bretonnel Cohen,et al.  CLEF eHealth 2017 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in English and French , 2017, CLEF.

[15]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[16]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  C. Langlotz,et al.  Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. , 2017, AJR. American journal of roentgenology.

[19]  Elena Tutubalina,et al.  An Encoder-Decoder Model for ICD-10 Coding of Death Certificates , 2017, ArXiv.

[20]  Walter Daelemans,et al.  Selecting relevant features from the electronic health record for clinical code prediction , 2017, J. Biomed. Informatics.

[21]  James H Thrall,et al.  Application of Recently Developed Computer Algorithm for Automatic Classification of Unstructured Radiology Reports: Validation Study 1 , 2004 .

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[24]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[25]  Ronald M. Summers,et al.  Machine learning and radiology , 2012, Medical Image Anal..