Tumor Entity Recognition and Coding for Spanish Electronic Health Records

Abstract This paper describes a two-stage system to solve tumor entity detection and coding in Spanish health records. This system is submitted to the CANcer TExt Mining Shared Task (CANTEMIST), a challenge in the IberLEF 2020 Workshop. We include a comparison between two kinds of systems to tackle this problem. The first kind employ feature-based Conditional Random Fields (CRF), and the second kind is based on deep learning models. The reported experiments show that our proposals and their combination achieve a micro-F1 of 83.1% and 78.6% on the test data set for the first and second sub-tasks, respectively, and a MAP of 79.7% on the third sub-task.