A hybrid model of VSM and LDA for text clusteing

In today's era, the number of today's web text is exploding. The analysis of the text is still a hot topic. The traditional VSM model in the weight statistics and similarity calculation, due to the data latitude is too high, lack of understanding and other issues, will lead to the final clustering inaccurate. In view of this, this paper presents a hybrid model of VSM and LDA for text clustering. Through the collection of text, filtering, application of statistical methods we calculated VSM model and LDA model similarity respectively. The two similarity models are combined by linear addition method, and the mixed similarity is obtained. Then through the K-means algorithm for text clustering and the three models of clustering results we can get the visual effect of clustering. Finally we can judge the merits of the model. The experimental results show that this hybrid model is effective.