Fast and simple semantic class assignment for biomedical text

A simple and accurate method for assigning broad semantic classes to text strings is presented. The method is to map text strings to terms in ontologies based on a pipeline of exact matches, normalized strings, headword matching, and stemming headwords. The results of three experiments evaluating the technique are given. Five semantic classes are evaluated against the CRAFT corpus of full-text journal articles. Twenty semantic classes are evaluated against the corresponding full ontologies, i.e. by reflexive matching. One semantic class is evaluated against a structured test suite. Precision, recall, and F-measure on the corpus when evaluating against only the ontologies in the corpus is micro-averaged 67.06/78.49/72.32 and macro-averaged 69.84/83.12/75.31. Accuracy on the corpus when evaluating against all twenty semantic classes ranges from 77.12% to 95.73%. Reflexive matching is generally successful, but reveals a small number of errors in the implementation. Evaluation with the structured test suite reveals a number of characteristics of the performance of the approach.

[1]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[2]  李幼升,et al.  Ph , 1989 .

[3]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[4]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[5]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[6]  Marti A. Hearst,et al.  Predicting Gene Functions from Text Using a Cross-Species Approach , 2005, Pacific Symposium on Biocomputing.

[7]  Mark A. Musen,et al.  Prototyping a Biomedical Ontology Recommender Service , 2009 .

[8]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[9]  K. Bretonnel Cohen,et al.  The textual characteristics of traditional and Open Access scientific journals are similar , 2008, BMC Bioinformatics.

[10]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[11]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[12]  K. Bretonnel Cohen,et al.  Test Suite Design for Biomedical Ontology Concept Recognition Systems , 2010, LREC.

[13]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[14]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[15]  Cristian R. Munteanu,et al.  An Approach for the Automatic Recommendation of Ontologies Using Collaborative Knowledge , 2010, KES.

[16]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[17]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[18]  K. Bretonnel Cohen,et al.  Concept Recognition and the TREC Genomics Tasks , 2005, TREC.

[19]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[20]  Helen L. Johnson,et al.  Concept recognition for extracting protein interaction relations from biomedical text , 2008, Genome Biology.

[21]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[22]  Moustafa Ghanem,et al.  : Three Approaches to GO-Tagging Biomedical Abstracts , 2006, SMBM.