Word frequency effects in high-dimensional co-occurrence models: A new approach

The HAL (hyperspace analog to language) model of lexical semantics uses global word co-occurrence from a large corpus of text to calculate the distance between words in co-occurrence space. We have implemented a system called HiDEx (High Dimensional Explorer) that extends HAL in two ways: It removes unwanted influence of orthographic frequency from the measures of distance, and it finds the number of words within a certain distance of the word of interest (NCount, the number of neighbors). These two changes to the HAL model produce