WordNetContext: Information Retrieval-friendly Access to WordNet Senses

Knowledge graphs have shown to be effective in improving information retrieval effectiveness, in particular together with entity linking [3, 6, 13, 10], which sets a new standard for the Robust 2004. When utilizing knowledge graphs and semantic annotations in information retrieval, two of the most useful features are the full text of the Wikipedia article and the textual context surrounding entity links [3, 6]. For a given free-text query, both Wiki-text and entity link contexts effectively support the retrieval of relevant entities; they constitute a rich source for query-specific expansion terms and entity-aware text relevance features. We are now adjusting this approach to better utilize WordNet for information retrieval. WordNet is a lexical database that has been curated manually over several decades according to psycholinguistic and computational theories of human lexical memory [7]. The major hurdle is that WordNet is a “vertical” resource, describing a taxonomic hierarchy of terms, where for information retrieval we also require “horizontal” information, i.e., access to other contextually related words for the same word sense. Currently, the only horizontal information available in WordNet are short glosses. Princeton’s SemCor [8] constitutes an early attempt to link text tokens to the appropriate WordNet synsets. However, this resource is small and the annotated text is somewhat outdated. While manually selected synsets show improvements for retrieval [12], fully automated approaches either expand with all synsets or include expensive word sense disambiguation into the retrieval step [9]. Here we are investigating a third approach by building a “horizontal” resource: We apply word sense disambiguation [9] to large corpora and extract contexts surrounding disambiguated word senses. We construct WordNetContext, an auxiliary text resource to accompany WordNet, by associating each word sense with (1) the gloss and (2) all sense contexts. This new resource enables fast and efficient identification of the WordNet sense that is relevant to a keyword query, simply by indexing and retrieving from this resource. As a result, we obtain a reliable means for fully automated query expansion through disambiguated synonyms. We use this approach to cross-reference knowledge graphs with relevant WordNet senses. As depicted in Figure 1, these cross-references are based on the similarity between Wiki-text and entries in our WordNetContext resource. Finally, the WordNetContext resource text will be overlaid with annotations of WordNet’s morphosyntactic, and semantic relations [4]. Since many queries, corpora, WordNet and Wikipedia are multi-lingual, we also envision various feedback mechanisms relevant for cross-language information retrieval. At its core, the new WordNetContext resource provides an ecosystem for the exchange of sense mappings and relations, including “horizontal” information about co-occurring terms, phrases, and Wikipedia entities. Therefore, we believe that the availability of WordNetContext will crucially increase the usefulness of the WordNet resource for information retrieval and text understanding. To the best of our knowledge, previous works [5, 11, 1, 2] have not explored such a ressource for disambiguation and expansion in retrieval.