论文信息 - Effective Document Clustering with Particle Swarm Optimization

Effective Document Clustering with Particle Swarm Optimization

The paper presents a comparative analysis of K-means and PSO based clustering performances for text datasets. The dimensionality reduction techniques like Stop word removal, Brill’s tagger algorithm and mean Tf-Idf are used while reducing the size of dimension for clustering. The results reveal that PSO based approaches find better solution compared to K-means due to its ability to evaluate many cluster centroids simultaneously in any given time unlike K-means.

K. R. Chandran | K. Srinivasa Rao | Suresh Chandra Satapathy | Gunanidhi Pradhan | Ramanji Killani

[1] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[2] Eric O. Postma,et al. Dimensionality Reduction: A Comparative Review , 2008 .

[3] James Kennedy,et al. Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[4] Ning Zhong,et al. Methodologies for Knowledge Discovery and Data Mining , 2002, Lecture Notes in Computer Science.

[5] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[6] LiuHuan,et al. Subspace clustering for high dimensional data , 2004 .

[7] Stefan Rüger,et al. Feature Reduction for Document Clustering and Classification , 2000 .

[8] Tian Weixin,et al. Text Document Clustering Based on the Modifying Relations , 2008, 2008 International Conference on Computer Science and Software Engineering.

[9] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[10] Thomas E. Potok,et al. Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..