Evaluating and combining and biomedical named entity recognition systems

This paper is concerned with the evaluation of biomedical named entity recognition systems. We compare two such systems, one based on a Hidden Markov Model and one based on Conditional Random Fields and syntactic parsing. In our experiments we used automatically generated data as well as manually annotated material, including a new dataset which consists of biomedical full papers. Through our evaluation, we assess the strengths and weaknesses of the systems tested, as well as the datasets themselves in terms of the challenges they present to the systems.

[1]  Andreas Vlachos Gene Mention Tagging with CRFs and Parsing 1 Tackling the BioCreative 2 Gene Mention task with Conditional Random Fields and Syntactic Parsing , 2007 .

[2]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[3]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[4]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[5]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[6]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[7]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[8]  Simone Teufel,et al.  An Architecture for Language Processing for Scientific Texts , 2006 .

[9]  Andreas Vlachos,et al.  Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain , 2006, BioNLP@NAACL-HLT.

[10]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[11]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[12]  Caroline Gasperin,et al.  Semi-supervised anaphora resolution in biomedical texts , 2006, BioNLP@NAACL-HLT.

[13]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Christian Siefkes A Comparison of Tagging Strategies for Statistical Information Extraction , 2006, HLT-NAACL.

[16]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[17]  Alexander A. Morgan,et al.  Overview of BioCreAtIvE task 1B: normalized gene lists , 2005, BMC Bioinformatics.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Alexander A. Morgan,et al.  Gene name identification and normalization using a model organism database , 2004, J. Biomed. Informatics.

[20]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[21]  Ted Briscoe,et al.  Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles , 2006, Pacific Symposium on Biocomputing.

[22]  Ruth L. Seal,et al.  Annotation of anaphoric relations in biomedical full-text articles using a domain-relevant scheme , 2007 .