论文信息 - Nested Named Entity Recognition

Nested Named Entity Recognition

Many named entities contain other named entities inside them. Despite this fact, the field of named entity recognition has almost entirely ignored nested named entity recognition, but due to technological, rather than ideological reasons. In this paper, we present a new technique for recognizing nested named entities, by using a discriminative constituency parser. To train the model, we transform each sentence into a tree, with constituents for each named entity (and no other syntactic structure). We present results on both newspaper and biomedical corpora which contain nested named entities. In three out of four sets of experiments, our model outperforms a standard semi-CRF on the more traditional top-level entities. At the same time, we improve the overall F-score by up to 30% over the flat model, which is unable to recover any nested entities.

Christopher D. Manning | Jenny Rose Finkel | J. Finkel

[1] Kate Byrne. Nested Named Entity Recognition in Historical Archive Text , 2007, International Conference on Semantic Computing (ICSC 2007).

[2] Jian Su,et al. Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[3] Alexander Clark,et al. Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[4] William W. Cohen,et al. Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[5] Mihai Surdeanu,et al. UPC: Experiments with Joint Learning within SemEval Task 9 , 2007, SemEval@ACL.

[6] Mariona Taulé,et al. DRAFT VERSION 1 AnCora : Multilingual and Multilevel Annotated Corpora , 2007 .

[7] Jian Su,et al. Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[8] Jian Su,et al. Exploring Deep Knowledge Resources in Biomedical Name Recognition , 2004, NLPBA/BioNLP.

[9] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10] Jin-Dong Kim,et al. The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[11] Sophia Ananiadou,et al. Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.