Combination of Fuzzy C-Means and Particle Swarm Optimization for Text Document Clustering

Document clustering, an important tool for document organization and browsing, has become an active field of research in the machine learning community. Fuzzy c-means (FCM), a powerfully unsupervised clustering algorithm, has been widely used for categorization problems. However, as an optimization algorithm, it easily leads to local optimal clusters. Particle swarm optimization (PSO) algorithm is a stochastic global optimization technique. This paper presents a hybrid approach for text document clustering based on fuzzy c-means and particle swarm optimization (PSO-FCM), which makes full use of the merits of both algorithms. The PSO-FCM not only helps the FCM clustering escape from local optima but also overcomes the shortcoming of the slow convergence speed of the PSO algorithm. Experimental results on two commonly used data sets show that the proposed method outperforms than that of FCM and PSO algorithms.

[1]  Xijin Tang,et al.  Text clustering using frequent itemsets , 2010, Knowl. Based Syst..

[2]  Kuang Yu Huang,et al.  Author ' s personal copy A hybrid particle swarm optimization approach for clustering and classification of datasets , 2011 .

[3]  Kam-Fai Wong,et al.  An intelligent information agent for document title classification and filtering in document-intensive domains , 2007, Decis. Support Syst..

[4]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[5]  Madan Gopal,et al.  A comparison study on multiple binary-class SVM methods for unilabel text categorization , 2010, Pattern Recognit. Lett..

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[8]  Tieli Sun,et al.  An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization , 2009, Expert Syst. Appl..

[9]  Morteza Haghir Chehreghani,et al.  Novel meta-heuristic algorithms for clustering web documents , 2008, Appl. Math. Comput..

[10]  Wei Kong,et al.  Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data , 2008, Comput. Biol. Chem..

[11]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[12]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.