A Vector Model for Syntagmatic and Paradigmatic Relatedness

This paper introduces context digests, high-dimensional real-valued representations for the typical left and right contexts of a word. Initial entries for the context digests are formed from the word’s close left and right neighbors. A singular value decomposition reduces the dimensionality of the space to enable subsequent efficient processing. In contrast to similar techniques, no preprocessor such as a parser is required. Context digests summarize both syntagmatic and paradigmatic relations between words: how typical they are as neighbors and how well they are substitutable for each other. We apply context digests to identifying collocations, to assessing the similarity of the arguments of different verbs, and to clustering occurrences of adjectives and verbs according to the words they modify in context.