Effective Document Clustering with Particle Swarm Optimization

The paper presents a comparative analysis of K-means and PSO based clustering performances for text datasets. The dimensionality reduction techniques like Stop word removal, Brill’s tagger algorithm and mean Tf-Idf are used while reducing the size of dimension for clustering. The results reveal that PSO based approaches find better solution compared to K-means due to its ability to evaluate many cluster centroids simultaneously in any given time unlike K-means.

[1]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[2]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[3]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[4]  Ning Zhong,et al.  Methodologies for Knowledge Discovery and Data Mining , 2002, Lecture Notes in Computer Science.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  LiuHuan,et al.  Subspace clustering for high dimensional data , 2004 .

[7]  Stefan Rüger,et al.  Feature Reduction for Document Clustering and Classification , 2000 .

[8]  Tian Weixin,et al.  Text Document Clustering Based on the Modifying Relations , 2008, 2008 International Conference on Computer Science and Software Engineering.

[9]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[10]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..