Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems

In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most appropriate one. In this ranking process the systems rely on context information obtained from the document where the anchor is mentioned and/or from Wikipedia. In this paper we center our attention in the use of Wikipedia links as context information. In particular, we offer a review of several candidate ranking approaches in the state-of-the-art that rely on Wikipedia link information. In addition, we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items.

[1]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[2]  Heng Ji,et al.  Collaborative Ranking: A Case Study on Entity Linking , 2011, EMNLP.

[3]  Schloss Birlinghoven,et al.  Entity Disambiguation using Link based Relations extracted from Wikipedia , 2010 .

[4]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[5]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[6]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[7]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Chin-Yew Lin,et al.  MSRA at TAC 2011: Entity Linking , 2011, TAC.

[9]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[10]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[11]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[12]  Silviu Cucerzan MSR System for Entity Linking at TAC 2012 , 2012, TAC.

[13]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[14]  Hendrik Blockeel,et al.  On estimating model accuracy with repeated cross-validation , 2012 .

[15]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[16]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[17]  Michael Sintek,et al.  NEWS: Bringing Semantic Web Technologies into News Agencies , 2006, SEMWEB.

[18]  Ziqi Zhang,et al.  Graph-based Semantic Relatedness for Named Entity Disambiguation , 2009 .

[19]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[20]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[21]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[22]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Norberto Fernández García,et al.  WebTLab: A cooccurrence-based approach to KBP 2010 Entity-Linking task , 2010, TAC.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Andrew Trotman,et al.  Overview of the INEX 2008 Link the Wiki Track , 2008, INEX.

[26]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[27]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[30]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[31]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[32]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[33]  Iryna Gurevych,et al.  Link Discovery: A Comprehensive Analysis , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[34]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[35]  Wanxiang Che,et al.  HIT Approaches to Entity Linking at TAC 2011 , 2011, TAC.

[36]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.

[38]  Andrew Trotman,et al.  Overview of INEX 2007 Link the Wiki Track , 2007, INEX.

[39]  S. Soderland,et al.  - based Named Entity Disambiguation to Arbitrary Web Text , 2009 .

[40]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[41]  Stan Matwin,et al.  A WordNet-based Algorithm for Word Sense Disambiguation , 1995, IJCAI.

[42]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[43]  Joel Nothman,et al.  Analysing Wikipedia and Gold-Standard Corpora for NER Training , 2009, EACL.

[44]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[45]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[46]  Tru H. Cao,et al.  Exploring Wikipedia and Text Features for Named Entity Disambiguation , 2010, ACIIDS.

[47]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[48]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[49]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[50]  Valentin I. Spitkovsky,et al.  Stanford-UBC Entity Linking at TAC-KBP , 2010, TAC.

[51]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[52]  Avirup Sil Exploring re-ranking approaches for joint named-entityrecognition and linking , 2013, PIKM '13.

[53]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[54]  Sahin Albayrak,et al.  DAI Approaches to the TAC-KBP 2011 Entity Linking Task , 2011, TAC.

[55]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[56]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[57]  P. Cohen,et al.  Measuring Confidence Intervals in Link Discovery: A Bootstrap Approach , 2004 .

[58]  Angelika Mueller,et al.  Principles Of Random Walk , 2016 .

[59]  Norberto Fernández García,et al.  WikiIdRank++: EXTENSIONS AND IMPROVEMENTS OF THE WikiIdRank SYSTEM FOR ENTITY LINKING , 2013, Int. J. Artif. Intell. Tools.

[60]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[61]  J. Silva,et al.  A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora , 2009 .

[62]  Heng Ji,et al.  Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes , 2011, ULNLP@EMNLP.

[63]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.