LASIGE: using Conditional Random Fields and ChEBI ontology

For participating in the SemEval 2013 challenge of recognition and classification of drug names, we adapted our chemical entity recognition approach consisting in Conditional Random Fields for recognizing chemical terms and lexical similarity for entity resolution to the ChEBI ontology. We obtained promising results, with a best F-measure of 0.81 for the partial matching task when using post-processing. Using only Conditional Random Fields the results are slightly lower, achieving still a good result in terms of Fmeasure. Using the ChEBI ontology allowed a significant improvement in precision (best precision of 0.93 in partial matching task), which indicates that taking advantage of an ontology can be extremely useful for enhancing chemical entity recognition.

[1]  Mário J. Silva,et al.  Finding genomic ontology terms in text using evidence content , 2005, BMC Bioinformatics.

[2]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[3]  Catia Pesquita,et al.  Chemical Entity Recognition and Resolution to ChEBI , 2012, ISRN bioinformatics.

[4]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[5]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[6]  Dietrich Rebholz-Schuhmann,et al.  Identification of Chemical Entities in Patent Documents , 2009, IWANN.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[9]  Simone Teufel,et al.  Annotation of Chemical Named Entities , 2007, BioNLP@ACL.

[10]  Francisco M. Couto,et al.  Identifying Chemical Entities based on ChEBI , 2012, ICBO.

[11]  Emanuel Santos,et al.  Testing the AgreementMaker System in the Anatomy Task of OAEI 2012 , 2012, ArXiv.

[12]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[13]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[14]  Mário J. Silva,et al.  Disjunctive shared information between ontology concepts: application to Gene Ontology , 2011, J. Biomed. Semant..

[15]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[16]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[17]  João D. Ferreira,et al.  Semantic Similarity for Automatic Classification of Chemical Compounds , 2010, PLoS Comput. Biol..