On the Reproducibility of the TAGME Entity Linking System

Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.

[1]  Krisztian Balog,et al.  Entity linking and retrieval for semantic search , 2014, WSDM.

[2]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[3]  Krisztian Balog,et al.  Entity Linking in Queries: Tasks and Evaluation , 2015, ICTIR.

[4]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[5]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[6]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[7]  Salvatore Orlando,et al.  Dexter: an open source framework for entity linking , 2013, ESAIR '13.

[8]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[9]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[10]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[11]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[12]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[13]  Hsin-Hsi Chen,et al.  NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014 , 2014, ERD '14.

[14]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[15]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[16]  Hinrich Schütze,et al.  The SMAPH system for query entity recognition and disambiguation , 2014, ERD '14.

[17]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.