Across-Document Neighborhood Expansion: UMass at TAC KBP 2012 Entity Linking

Last year’s competition demonstrated that the NER context contains important information that should not be ignored in entity linking. State-of-the-art approaches anchor on unambiguous entities, look for overlap in categories, or approximate a joint model of candidate assignments, after Wikipedia candidates have been selected. Current candidate approaches, such as anchor text maps, are effective but may lead to very large candidate sets to be examined. UMass has two objectives for our TAC submission. First, we use cross-document context information to perform entity neighborhood expansion and estimate the importance of entity context using corpus-wide information. Second, we use probabilistic information retrieval that incorporates the neighborhood information to generate a ranked candidate set in a single step. The result is a small candidate set that even for less than 50 candidates contains the true answer in 95% of the cases, allowing for computationally intensive inference in the next phase. It turns out that our best performing run simply predicts the top candidate of the unsupervised candidate ranking, outperforming more than half of the contestants.