论文信息 - Robust Disambiguation of Named Entities in Text

Robust Disambiguation of Named Entities in Text

Disambiguating named entities in natural-language text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.

[1] Tru H. Cao,et al. Named entity disambiguation on an ontology enriched by Wikipedia , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[2] David Yarowsky,et al. Cross-Document Coreference Resolution: A Key Technology for Learning by Reading , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[3] Razvan C. Bunescu,et al. Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[4] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[5] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[6] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.

[7] Luis Gravano,et al. Introduction to the Special Issue on Man aging Information Extraction , 2008 .

[8] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[9] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10] Bo Zhang,et al. StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[11] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.

[12] Rada Mihalcea,et al. Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[13] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.