Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization

Abstract Document clustering is a gathering of textual content documents into groups or clusters. The main aim is to cluster the documents, which are internally logical but considerably different from each other. It is a crucial process used in information retrieval, information extraction and document organization. In recent years, the spectral clustering is widely applied in the field of machine learning as an innovative clustering technique. This research work proposes a novel Spectral Clustering algorithm with Particle Swarm Optimization (SCPSO) to improve the text document clustering. By considering global and local optimization function, the randomization is carried out with the initial population. This research work aims at combining the spectral clustering with swarm optimization to deal with the huge volume of text documents. The proposed algorithm SCPSO is examined with the benchmark database against the other existing approaches. The proposed algorithm SCPSO is compared with the Spherical K-means, Expectation Maximization Method (EM) and standard PSO Algorithm. The concluding results show that the proposed SCPSO algorithm yields better clustering accuracy than other clustering techniques.

[1]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[3]  Kumiko Tanaka-Ishii,et al.  Multilingual Spectral Clustering Using Document Similarity Propagation , 2009, EMNLP.

[4]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Sheng Tang,et al.  Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization , 2008, IEA/AIE.

[6]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  Husniza Husni,et al.  Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering , 2015, IVIC.

[9]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[10]  David Camacho,et al.  GANY: A genetic spectral-based clustering algorithm for Large Data Analysis , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[11]  Mohamed Nadif,et al.  Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity , 2015, CIKM.

[12]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  Mohamed Nadif,et al.  Sparse Poisson Latent Block Model for Document Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[16]  Amit Konar,et al.  Document Clustering Using Differential Evolution , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[17]  Husniza Husni,et al.  Document Clustering Based on Firefly Algorithm , 2015, J. Comput. Sci..

[18]  Wang Lian-guo,et al.  Hybrid Optimization Algorithm of PSO and AFSA , 2010 .

[19]  Asma Khazaal Abdulsahib Graph based text representation for document clustering , 2015 .

[20]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[21]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[22]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Suely Oliveira,et al.  A Multi-level Approach for Document Clustering , 2005, International Conference on Computational Science.