Large-Scale Named Entity Disambiguation Based on Wikipedia Data

This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities, the implemented system shows high disambiguation accuracy on both news stories and Wikipedia articles.

[1]  Gerald Salton,et al.  Automatic text processing , 1988 .

[2]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[3]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[4]  Allison Woodruff,et al.  GIPSY: automated geographic indexing of text documents , 1994 .

[5]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[6]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[7]  Zunaid Kazi,et al.  Is Hillary Rodham Clinton the President? Disambiguating Names across Documents , 1999, COREF@ACL.

[8]  Yasusi Kanada A method of geographical name extraction from Japanese text for thematic geographical search , 1999, CIKM '99.

[9]  Brian Roark,et al.  Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction , 2000, COLING.

[10]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[11]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[12]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[13]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[14]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[15]  Dominic Widdows,et al.  Using LSA and Noun Coordination Information to Improve the Recall and Precision of Automatic Hyponymy Extraction , 2003, CoNLL.

[16]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[17]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[18]  Frank Keller,et al.  The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks , 2004, NAACL.

[19]  Weblog Wikipedia,et al.  In Wikipedia the Free Encyclopedia , 2005 .

[20]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[21]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[22]  Stefan M. Rüger,et al.  Identifying and grounding descriptions of places , 2006, GIR.

[23]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.