Evaluating Entity Linking with Wikipedia

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate entities and then disambiguate them, returning either the best candidate or nil. However, comparison has focused on disambiguation accuracy, making it difficult to determine how search impacts performance. Furthermore, important approaches from the literature have not been systematically compared on standard data sets. We reimplement three seminal nel systems and present a detailed evaluation of search strategies. Our experiments find that coreference and acronym handling lead to substantial improvement, and search strategies account for much of the variation between systems. This is an interesting finding, because these aspects of the problem have often been neglected in the literature, which has focused largely on complex candidate ranking algorithms.

[1]  Cheng Niu,et al.  Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction , 2004, ACL.

[2]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[3]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[4]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[5]  Danuta Ploch Exploring Entity Relations for Named Entity Disambiguation , 2011, ACL.

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[7]  Vasudeva Varma,et al.  IIIT Hyderabad at TAC 2009 , 2008, TAC.

[8]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Alexander A. Morgan,et al.  Overview of BioCreAtIvE task 1B: normalized gene lists , 2005, BMC Bioinformatics.

[11]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[12]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[13]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[14]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[15]  Wanxiang Che,et al.  A Graph-based Method for Entity Linking , 2011, IJCNLP.

[16]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[17]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[18]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[19]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[20]  James R. Curran,et al.  Automating Financial Surveillance , 2009, UCMedia.

[21]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[22]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[23]  Massimo Poesio,et al.  A Corpus for Cross-Document Co-reference , 2008, LREC.

[24]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[25]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[26]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[27]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[28]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[29]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[30]  Emanuele Pianta,et al.  Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia , 2010, PWNLP@COLING.

[31]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[32]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[33]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[34]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[35]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[36]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[37]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[38]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[39]  S. Sekine,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, *SEMEVAL.

[40]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[41]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[42]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[43]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[44]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[45]  Kenneth Ward Church,et al.  Work on Statistical Methods for Word Sense Disambiguation , 1992 .

[46]  S. Soderland,et al.  - based Named Entity Disambiguation to Arbitrary Web Text , 2009 .

[47]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[48]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[49]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[50]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[51]  Joel Nothman,et al.  Analysing Wikipedia and Gold-Standard Corpora for NER Training , 2009, EACL.

[52]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[53]  Simone Paolo Ponzetto,et al.  Taxonomy induction based on a collaboratively built knowledge repository , 2011, Artif. Intell..