Using Singular-value Decomposition on Local Word Contexts to Derive a Measure of Constructional Similarity

This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most significant components of variation from a large data matrix, and which has previously been used in methods like Latent Semantic Analysis to identify latent semantic variables in text. We present the results of applying these scores to standard synonym benchmark tests, and argue on the basis of these results that our similarity metric represents an aspect of word usage which is largely orthogonal to that addressed by other methods, such as Latent Semantic Analysis. In particular, it appears that this method captures similarity with respect to the participation of words in grammatical constructions, at a level of generalization corresponding to broad syntacticosemantic classes such as body part terms, kin terms and the like. Aside from assessing word similarity, this method has promising applications in language modeling and automatic lexical acquisition.