Automatic Assignment of ICD-10 Codes to Diagnostic Texts using Transformers Based Techniques

A fundamental task for epidemiology, statistics, and health informatics is to associate some standardized meaning to textual expressions, to enable their retrieval, aggregation and interpretation. Among the relevant expressions, those mentioning health conditions and diagnoses are of paramount importance and can be found in almost any clinical document, including death certificates. These expressions are usually coded with the International Classification of Diseases. In this paper we employ both classical Machine Learning and BERT based models to perform the automatic classification of diagnostic texts extracted from death certificates. We show the effectiveness of our proposed approach over a set of experiments, where we experiment with multiple set of features and variant of the algorithms. Our results show that BERT based models, and in particular the ones pre-trained on the specific domain outperform classical ML algorithms, reaching Accuracy and F1-Score of respectively 0.952 and 0.943.