PSOHS: an efficient two-stage approach for data clustering

Cluster analysis is an important task in data mining and refers to group a set of objects such that the similarities among objects within the same group are maximal while similarities among objects from different groups are minimal. The particle swarm optimization algorithm (PSO) is one of the famous metaheuristic optimization algorithms, which has been successfully applied to solve the clustering problem. However, it has two major shortcomings. The PSO algorithm converges rapidly during the initial stages of the search process, but near global optimum, the convergence speed will become very slow. Moreover, it may get trapped in local optimum if the global best and local best values are equal to the particle’s position over a certain number of iterations. In this paper we hybridized the PSO with a heuristic search algorithm to overcome the shortcomings of the PSO algorithm. In the proposed algorithm, called PSOHS, the particle swarm optimization is used to produce an initial solution to the clustering problem and then a heuristic search algorithm is applied to improve the quality of this solution by searching around it. The superiority of the proposed PSOHS clustering method, as compared to other popular methods for clustering problem is established for seven benchmark and real datasets including Iris, Wine, Crude Oil, Cancer, CMC, Glass and Vowel.

[1]  Salwani Abdullah,et al.  A combined approach for clustering based on K-means and gravitational search algorithms , 2012, Swarm Evol. Comput..

[2]  Abdolreza Hatamlou,et al.  Black hole: A new heuristic optimization approach for data clustering , 2013, Inf. Sci..

[3]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[4]  Abdolreza Hatamlou,et al.  In search of optimal centroids on data clustering using a binary search algorithm , 2012, Pattern Recognit. Lett..

[5]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[6]  Mao Ye,et al.  A tabu search approach for the minimum sum-of-squares clustering problem , 2008, Inf. Sci..

[7]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[8]  Marimuthu Palaniswami,et al.  Clustering ellipses for anomaly detection , 2011, Pattern Recognit..

[9]  Ned Freed,et al.  A mixed-integer programming approach to the clustering problem , 1983 .

[10]  Salwani Abdullah,et al.  Data Clustering Using Big Bang–Big Crunch Algorithm , 2011 .

[11]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[12]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[14]  Shuyuan Yang,et al.  Evolutionary clustering based vector quantization and SPIHT coding for image compression , 2010, Pattern Recognit. Lett..

[15]  Reynaldo Gil-García,et al.  Dynamic hierarchical algorithms for document clustering , 2010, Pattern Recognit. Lett..

[16]  Mohammad Reza Meybodi,et al.  A new hybrid approach for data clustering , 2010, 2010 5th International Symposium on Telecommunications.

[17]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[18]  Elahe Taherian Fard,et al.  A new hybrid imperialist competitive algorithm on data clustering , 2011 .

[19]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..

[20]  R. J. Kuo,et al.  Application of particle swarm optimization to association rule mining , 2011, Appl. Soft Comput..

[21]  Y W Guo,et al.  Optimisation of integrated process planning and scheduling using a particle swarm optimisation approach , 2009 .

[22]  Haozhong Cheng,et al.  New discrete method for particle swarm optimization and its application in transmission network expansion planning , 2007 .

[23]  Faming Liang,et al.  Dynamic agglomerative clustering of gene expression profiles , 2007, Pattern Recognit. Lett..

[24]  Jun Wang,et al.  Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation , 2009, Pattern Recognit..

[25]  Idel Montalvo,et al.  Particle Swarm Optimization applied to the design of water supply systems , 2008, Comput. Math. Appl..

[26]  D. Y. Sha,et al.  A new particle swarm optimization for the open shop scheduling problem , 2008, Comput. Oper. Res..

[27]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[28]  F. Sibel Salman,et al.  A mixed-integer programming approach to the clustering problem with an application in customer segmentation , 2006, Eur. J. Oper. Res..

[29]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  Matteo Gaeta,et al.  Exploring e-Learning Knowledge Through Ontological Memetic Agents , 2010, IEEE Computational Intelligence Magazine.

[32]  Sivakumar Ramakrishnan,et al.  A survey: hybrid evolutionary algorithms for cluster analysis , 2011, Artificial Intelligence Review.

[33]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[34]  Amitava Chatterjee,et al.  A hybrid cooperative-comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding , 2008, Expert Syst. Appl..

[35]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[36]  Yi Pan,et al.  Clustering support vector machines for protein local structure prediction , 2007, Expert Syst. Appl..

[37]  Liang Liao,et al.  MRI brain image segmentation and bias field correction based on fast spatially constrained kernel clustering approach , 2008, Pattern Recognit. Lett..

[38]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[39]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[40]  Henry Anaya-Sánchez,et al.  A document clustering algorithm for discovering and describing topics , 2010, Pattern Recognit. Lett..

[41]  Claudio A. Perez,et al.  Face and iris localization using templates designed by particle swarm optimization , 2010, Pattern Recognit. Lett..

[42]  Morteza Haghir Chehreghani,et al.  Novel meta-heuristic algorithms for clustering web documents , 2008, Appl. Math. Comput..

[43]  Rongchun Zhao,et al.  Image segmentation by clustering of spatial patterns , 2007, Pattern Recognit. Lett..

[44]  Yong Deng,et al.  Infrared image segmentation with 2-D maximum entropy method based on particle swarm optimization (PSO) , 2005, Pattern Recognit. Lett..

[45]  Matteo Gaeta,et al.  COMBINING MULTI‐AGENT PARADIGM AND MEMETIC COMPUTING FOR PERSONALIZED AND ADAPTIVE LEARNING EXPERIENCES , 2011, Comput. Intell..

[46]  Abraham Kandel,et al.  Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology , 2007, Inf. Sci..

[47]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[48]  João Paulo Papa,et al.  Projections Onto Convex Sets through Particle Swarm Optimization and its application for remote sensing image restoration , 2010, Pattern Recognit. Lett..

[49]  Zülal Güngör,et al.  K-harmonic means data clustering with simulated annealing heuristic , 2007, Appl. Math. Comput..

[50]  Magdalene Marinaki,et al.  Ant colony and particle swarm optimization for financial classification problems , 2009, Expert Syst. Appl..

[51]  Ali Maroosi,et al.  Application of honey-bee mating optimization algorithm on clustering , 2007, Appl. Math. Comput..

[52]  Chih-Hsuan Wang,et al.  Outlier identification and market segmentation using kernel-based clustering techniques , 2009, Expert Syst. Appl..

[53]  Salwani Abdullah,et al.  Application of Gravitational Search Algorithm on Data Clustering , 2011, RSKT.

[54]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.