论文信息 - A linguistic feature based text clustering method

A linguistic feature based text clustering method

The traditional K-means algorithm is sensitive to the initial point, easy to fall into local optimum. In order to avoid this kind of flaw, an improved K-means text clustering method WIKTCM is proposed. The new method creates an innovative initial centers selection method and accommodates the contribution of characteristics of different parts of speech to the text. In addition, the impact of outliers is considered. Experimental results show that the new method has better clustering results.

[1] Yong Xu,et al. K-Means Clustering Algorithm with Refined Initial Center , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[2] Pavel Blagoveston Bochev,et al. A vector space model for information retrieval with generalized similarity measures. , 2012 .

[3] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[5] George Karypis,et al. Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[6] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7] Shijue Zheng,et al. A New Method for Initialising the K-Means Clustering Algorithm , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[8] Raihana Ferdous,et al. An efficient k-means algorithm integrated with Jaccard distance measure for document clustering , 2009, 2009 First Asian Himalayas International Conference on Internet.