This paper describes the joint Stanford-UBC knowledge base population system for the entity linking tasks. We participated in both the English and the cross-lingual tasks, using a dictionary from strings to possible Wikipedia titles, taken from our 2009 submission. This dictionary is based on frequencies of Wikipedia back-links, and it provides a strong context-independent baseline. For the English track, we improved on the results given by the dictionary by disambiguating entities using a distantly supervised classifier, trained on context extracted from Wikipedia. Since we did not use any text from the Wikipedia pages associated with the knowledge base nodes for the dictionary, we submitted that run to the no wiki text track, and the one using the distantly supervised classifier to the wiki text track. Our work focused on disambiguating among articles, allowing for very simple NIL strategies: the system returned NIL whenever selected Wikipedia articles were not present in the KB; moreover, NILs were then clustered only according to the target string. These simple approaches were sufficient for our runs to score above the median entry in each of their respective tracks for the English task; for the cross-lingual task, there was only one track, and our submissions (using the English-specific, context-independent dictionaries) fell below the median.
[1]
Valentin I. Spitkovsky,et al.
A Cross-Lingual Dictionary for English Wikipedia Concepts
,
2012,
LREC.
[2]
Valentin I. Spitkovsky,et al.
Stanford-UBC at TAC-KBP
,
2009,
TAC.
[3]
Eneko Agirre,et al.
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
,
2006
.
[4]
Valentin I. Spitkovsky,et al.
Strong Baselines for Cross-Lingual Entity Linking
,
2011,
TAC.
[5]
Eneko Agirre,et al.
Word Sense Disambiguation: Algorithms and Applications
,
2007
.
[6]
Heng Ji,et al.
Overview of the TAC 2010 Knowledge Base Population Track
,
2010
.
[7]
Valentin I. Spitkovsky,et al.
Stanford-UBC Entity Linking at TAC-KBP
,
2010,
TAC.