Vector-based semantic analysis: representing word meanings based on random labels

Vector-based semantic analysis is the practice of representing word meanings as semantic vectors, calculated from the co-occurrence statistics of words in large text data. This paper discusses the theoretical presumptions behind this practice, and a representational scheme based on the Distributional Hypothesis is identified as the rationale for vector-based semantic analysis. A new method for calculating semantic word vectors is then described. The method uses random labelling of words in narrow context windows to calculate semantic context vectors for each word type in the text data. The method is evaluated with a standardised synonym test, and it is shown that incorporating linguistic information in the context vectors can enhance the results.