论文信息 - Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts

Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts

textabstractWe participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training ma-terial. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.

[1] Erik M. van Mulligen,et al. Biomedical Concept Recognition in French Text Using Automatic Translation of English Terms , 2015, CLEF.

[2] Allen C. Browne,et al. Evaluating lexical variant generation to improve information retrieval , 1998, AMIA.

[3] Thierry Hamon,et al. CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical Named Entity Recognition , 2015, CLEF.

[4] Martijn J. Schuemie,et al. Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[5] Guido Zuccon,et al. Overview of the CLEF eHealth Evaluation Lab 2015 , 2015, CLEF.

[6] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[7] Pierre Zweigenbaum,et al. The Quaero French Medical Corpus : A Ressource for Medical Entity Recognition and Normalization , 2014 .

[8] K. Bretonnel Cohen,et al. Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016 , 2016, CLEF.

[9] Cyril Grouin,et al. Overview of the CLEF eHealth Evaluation Lab 2015 , 2015, CLEF.