Automatic Matching of ICD-10 codes to Diagnoses in Discharge Letters

This paper presents an approach for auto$ matic mapping of International Classifica$ tion of Diseases 10th revision (ICD$10) codes to diagnoses extracted from dis$ charge letters. The proposed algorithm is designed for processing free text docu$ ments in Bulgarian language. Diseases are often described in the medical patient records as free text using terminology, phrases and paraphrases which differ sig$ nificantly from those used in ICD$10 clas$ sification. In this way the task of diseases recognition (which practically means e.g. assigning standardized ICD codes to dis$ eases’ names) is an important natural lan$ guage processing (NLP) challenge. The approach is based on multiclass Support Vector Machines method, where each ICD$10 4 character classification code is considered as single class. The problem is reduced to multiple binary classifiers and classification is done by a max$wins vot$ ing strategy.