Research on Chinese text classification based on Word2vec

The set of features which the traditional feature selection algorithm of chi-square selected is not complete. This causes the low performance for the final text classification. Therefore, this paper proposes a method. The method utilizes word2vec to generate word vector to improve feature selection algorithm of the chi square. The algorithm applies the word vector generated by word2vec to the process of the traditional feature selection and uses these words to supplement the set of features as appropriate. Ultimately, the set of features obtained by this method has better discriminatory power. Because, the feature words with the better discriminatory power has the strong ability of distinguishing categories as its semantically similar words. On this base, multiple experiments have been carried out in this paper. The experimental results show that the performance of text classification can increase after extension of feature words.