Recognising Nested Named Entities in Biomedical Text

Although recent named entity (NE) annotation efforts involve the markup of nested entities, there has been limited focus on recognising such nested structures. This paper introduces and compares three techniques for modelling and recognising nested entities by means of a conventional sequence tagger. The methods are tested and evaluated on two biomedical data sets that contain entity nesting. All methods yield an improvement over the baseline tagger that is only trained on flat annotation.

[1]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[2]  John A. Carroll,et al.  Robust, applied morphological generation , 2000, INLG.

[3]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[4]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[5]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[6]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[7]  Jian Su,et al.  Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[8]  Jian Su,et al.  Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[9]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[10]  Jian Su,et al.  Enhancing HMM-based biomedical named entity recognition by studying special phenomena , 2004, J. Biomed. Informatics.

[11]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[12]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[13]  Malvina Nissim,et al.  Exploring the boundaries: gene and protein identification in biomedical text , 2005, BMC Bioinformatics.

[14]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[15]  K. Bretonnel Cohen,et al.  Corpus Design for Biomedical Natural Language Processing , 2005, LBLODMBS@IDMB.

[16]  Claire Grover,et al.  Tools to Address the Interdependence between Tokenisation and Standoff Annotation , 2006, NLPXML@EACL.

[17]  Claire Grover,et al.  Rule-Based Chunking and Reusability , 2006, LREC.

[18]  G. D. Zhou,et al.  Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid , 2006, Int. J. Medical Informatics.

[19]  Baohua Gu Recognizing Nested Named Entities in GENIA corpus , 2006, BioNLP@NAACL-HLT.

[20]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[21]  Claire Grover,et al.  Adapting a Relation Extraction Pipeline for the BioCreAtIvE II Tasks , 2007 .