Extracting Named Entities and Synonyms from Wikipedia

In many search domains, both contents and searches are frequently tied to named entities such as a person, a company or similar. An example of such a domain is a news archive. One challenge from an information retrieval point of view is that a single entity can have more than one way of referring to it. In this paper we describe how to use Wikipedia contents to automatically generate a dictionary of named entities and synonyms that are all referring to the same entity. This dictionary can subsequently be used to improve search quality, for example using query expansion. Through an experimental evaluation we show that with our approach, we can find named entities and their synonyms with a high degree of accuracy.

[1]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[2]  Christian Bøhn,et al.  Extracting Named Entities and Synonyms from Wikipedia for use in News Search , 2008 .

[3]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[4]  Gjergji Kasneci,et al.  YAWN: A Semantically Annotated Wikipedia XML Corpus , 2007, BTW.

[5]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[6]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[7]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[8]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.

[9]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Bernardo Magnini,et al.  A WordNet-Based Approach to Named Entites Recognition , 2022 .

[12]  Max Mühlhäuser,et al.  Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[13]  Antonio Toral,et al.  A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.

[14]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.