论文信息 - A New Text Clustering Algorithm Based on Improved K_means

A New Text Clustering Algorithm Based on Improved K_means

Text clustering is one of the difficult and hot research fields in the internet search engine research. A new text clustering algorithm is presented based on K-means and Self-Organizing Model (SOM). Firstly, texts are preprocessed to satisfy succeed process requirement. Secondly, the paper improves selection of initial cluster centers and cluster seed selection methods of K-means to improve the deficiency of K-means algorithm that the K-means algorithm is very sensitive to the initial cluster center and the isolated point text. Thirdly the advantages of k-means and SOM are combined to a new model to cluster text in the paper. Finally the experimental results indicate that the improved algorithm has a higher accuracy compared with the original algorithm, and has a better stability.

Xinwu Li

[1] Nikos A. Vlassis,et al. The global k-means clustering algorithm , 2003, Pattern Recognit..

[2] Tao Li,et al. Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[3] Tan Yong. An Implementation of Clustering Algorithm Based on K-means , 2004 .

[4] Zhang Yu,et al. An Improved K-means Algorithm , 2003 .

[5] Greg Hamerly,et al. Learning the k in k-means , 2003, NIPS.