Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016

This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System® (UMLS®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.

[1]  Karin M. Verspoor,et al.  Annotating the biomedical literature for the human variome , 2013, Database J. Biol. Databases Curation.

[2]  Zhiyong Lu,et al.  BioCreative-IV virtual issue , 2014, Database J. Biol. Databases Curation.

[3]  Lina Fatima Soualmia,et al.  SIBM at CLEF eHealth Evaluation Lab 2016: Extracting Concepts in French Medical Texts with ECMT and CIMIND , 2016, CLEF.

[4]  Ludovic Tanguy,et al.  LITL at CLEF eHealth2016: recognizing Entities in French Biomedical Documents , 2016, CLEF.

[5]  Pierre Zweigenbaum,et al.  The Quaero French Medical Corpus : A Ressource for Medical Entity Recognition and Normalization , 2014 .

[6]  Assaf Urieli,et al.  Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. (Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane) , 2013 .

[7]  Alexander A. Morgan,et al.  Overview of BioCreAtIvE task 1B: normalized gene lists , 2005, BMC Bioinformatics.

[8]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[9]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[10]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[11]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[12]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[13]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[14]  Stéfan Jacques Darmoni,et al.  Language Resources for French in the Biomedical Domain , 2014, LREC.

[15]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[16]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[17]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[18]  Patrick Ruch,et al.  BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction , 2016, CLEF.

[19]  Thierry Hamon,et al.  CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical Named Entity Recognition , 2015, CLEF.

[20]  Erik M. van Mulligen,et al.  Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts , 2016, CLEF.

[21]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[22]  Horacio Rodríguez,et al.  Semantic Tagging and Normalization of French Medical Entities , 2016, CLEF.

[23]  Guido Zuccon,et al.  Overview of the CLEF eHealth Evaluation Lab 2018 , 2018, CLEF.

[24]  Pierre Zweigenbaum,et al.  LIMSI ICD10 coding Experiments on CépiDC Death Certificate Statements , 2016, CLEF.

[25]  Julien Velcin,et al.  ECSTRA-INSERM @ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates , 2016, CLEF.

[26]  Erik M. van Mulligen,et al.  Biomedical Concept Recognition in French Text Using Automatic Translation of English Terms , 2015, CLEF.