ICD-10 coding of death certificates with the NCBO and SIFR Annotators at CLEF eHealth 2017

The SIFR BioPortal is an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology (NCBO). The portal facilitates the use and fostering of terminologies and ontologies by offering a set of services including semantic annotation. The SIFR Annotator (http://bioportal.lirmm.fr/annotator) is a publicly accessible, easily usable ontology-based annotation tool to process French text data and facilitate semantic indexing. The web service relies on the ontology content (preferred labels and synonyms) as well as on the semantic of the ontologies (is-a hierarchies) and their mappings. The SIFR BioPortal also offers the possibility of querying the original NCBO Annotator for English text via a dedicated proxy that extends the original functionality. In this paper, we present a preliminary performance evaluation of the generic annotation web service (i.e., not specifically customized) for coding death certificates i.e., annotating with ICD-10 codes. This evaluation is done against the CépiDC/CDC CLEF eHealth 2017 task 1 manually annotated corpus. For this purpose, we have built custom SKOS vocabularies from the CéPIDC/CDC dictionaries as well as training and development corpora, for all three tasks using a most frequent code heuristic to assign ambiguous labels. We then submitted the vocabularies to the NCBO and SIFR BioPortal and used the annotation services on task 1 datasets. We obtained, for our best runs on each corpus the following results: English raw corpus (69.08% P, 51.37% R, 58,92% F1); French raw corpus (54.11% P, 48.00% R, 50,87% F1); French aligned corpus (50.63% P, 52.97% R, 51.77% F1).

[1]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[2]  Clement Jonquet,et al.  Scoring Semantic Annotations Returned by The NCBO Annotator , 2014, SWAT4LS.

[3]  Nigel Collier,et al.  Using silver and semi-gold standard corpora to compare open named entity recognisers , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[4]  et al.,et al.  NCBO Technology: Powering semantically aware applications , 2013, Journal of Biomedical Semantics.

[5]  Jennifer R. Smith,et al.  Using the NCBO Web Services for Concept Recognition and Ontology Annotation of Expression Datasets , 2009, SWAT4LS.

[6]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[7]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[8]  Julien Grosjean,et al.  Multiterminology cross-lingual model to create the European Health Terminology/Ontology Portal , 2011 .

[9]  Stéfan Jacques Darmoni,et al.  Language Resources for French in the Biomedical Domain , 2014, LREC.

[10]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[11]  Mark A. Musen,et al.  NCBO Resource Index: Ontology-based search and mining of biomedical resources , 2010, J. Web Semant..

[12]  Clement Jonquet,et al.  SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminologies biomédicales françaises au service de l’annotation sémantique , 2016 .

[13]  Indra Neil Sarkar,et al.  Leveraging biomedical ontologies and annotation services to organize microbiome data from Mammalian hosts. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  J. Blake Bio-ontologies—fast and furious , 2004, Nature Biotechnology.

[15]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[16]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[17]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[18]  Pierre Zweigenbaum,et al.  The Quaero French Medical Corpus : A Ressource for Medical Entity Recognition and Normalization , 2014 .

[19]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.