Learning Lexical Properties from Word Usage Patterns: Which Context Words Should be Used?

Several recent papers have described how lexical properties of words can be captured by simple measurements of which other words tend to occur close to them. At a practical level, word co–occurrence statistics are used to generate high dimensional vector space representations and appropriate distance metrics are defined on those spaces. The resulting co–occurrence vectors have been used to account for phenomena ranging from semantic priming to vocabulary acquisition. We have developed a simple and highly efficient system for computing useful word co–occurrence statistics, along with a number of criteria for optimizing and validating the resulting representations. Other workers have advocated various methods for reducing the number of dimensions in the co–occurrence vectors. LundB LandauerD and Lowe&McDonald [8] have used a statistical reliability criterion. We have used a simpler framework that orders and truncates the dimensions according to their word frequency. Here we compare how the different methods perform for two evaluation criteria and briefly discuss the consequences of the different methodologies for work within cognitive or neural computation.