A Clustering Algorithm for Chinese Text Based on SOM Neural Network and Density

This paper introduces a clustering algorithm for Chinese text based on both SOM (Self-Organizing Map) neural network and density. The algorithm contains two stages. During the first stage, Chinese text are transformed into text vectors, which are used as training data of SOM and mapped by training SOM so that an initial clustering result for text data, i.e., a virtual coordinates set, is obtained. Then, during the second stage, the virtual coordinates set is further clustered according to density. It should be pointed out that the proposed algorithm in the first stage is different from the existing ones. Moreover, in the second stage, it outperforms other algorithms in computing time due to decreasing dimension. Numerical experiment shows that the algorithm is efficient for clustering text data and high multi-dimensional data.