Less is More: Filtering Abnormal Dimensions in GloVe

GloVe, global vectors for word representation, performs well in some word analogy and semantic relatedness tasks. However, we find that some dimensions of the trained word embedding are abnormal. We verify our conjecture via removing these abnormal dimensions using Kolmogorov-Smimov test and experiment on several benchmark datasets for semantic relatedness measurement. The experimental results confirm our finding. Interestingly, some of the tasks outperform the state-of-the-art model SensEmbed by simply removing these abnormal dimensions. The novel rule of thumb technique which leads to better performance is expected to be useful in practice.