De-Identification through Named Entity Recognition for Medical Document Anonymization

This paper introduces the system developed by the NLP UNED team participating in MEDDOCAN (Medical Document Anonymization) task, framed in the IberLEF 2019 evaluation workshop. The system DINER (De-Identification through Named Entity Recognition) consists of a deep neural network based on a core BI-LSTM structure. Input features have been modeled in order to suit the particular characteristics of medical texts, and especially medical reports, which can combine short semi-structured information with long free text paragraphs. The first results of the system on a synthetic test corpus corpus of 1000 clinical cases, manually annotated by health documentalists, indicate the potential of the DINER system.