A New Text Clustering Algorithm Based on Improved K_means

Text clustering is one of the difficult and hot research fields in the internet search engine research. A new text clustering algorithm is presented based on K-means and Self-Organizing Model (SOM). Firstly, texts are preprocessed to satisfy succeed process requirement. Secondly, the paper improves selection of initial cluster centers and cluster seed selection methods of K-means to improve the deficiency of K-means algorithm that the K-means algorithm is very sensitive to the initial cluster center and the isolated point text. Thirdly the advantages of k-means and SOM are combined to a new model to cluster text in the paper. Finally the experimental results indicate that the improved algorithm has a higher accuracy compared with the original algorithm, and has a better stability.