PSO aided k-means clustering: introducing connectivity in k-means

Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global view on data and alleviates some drawbacks of the k-Means algorithm; thus, it is able to spot types of clusters which are otherwise difficult to obtain (elongated shapes, non-similar volumes). Our experimental results show that PSO-kMeans improves the performance of standard k-Means in all test cases and performs at least comparable to state-of-the-art methods in the worst case. PSO-kMeans is robust to outliers. This comes at a cost: the preprocessing step for finding the nearest neighbors for each data item is required, which increases the initial linear complexity of k-Means to quadratic complexity.

[1]  Henri Luchian,et al.  Evolutionary automated classification , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  Mario Köppen,et al.  Data Swarm Clustering , 2006, Swarm Intelligence in Data Mining.

[4]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[6]  Dan Dumitrescu,et al.  Evolutionary prototype selection. , 2003 .

[7]  Olfa Nasraoui,et al.  Unsupervised Niche Clustering: Discovering an Unknown Number of Clusters in Noisy Data Sets , 2005 .

[8]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[9]  Taher Niknam,et al.  An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis , 2010, Appl. Soft Comput..

[10]  Lenuta Alboaie,et al.  Guiding users within trust networks using swarm algorithms , 2009, 2009 IEEE Congress on Evolutionary Computation.

[11]  R. Krovi,et al.  Genetic algorithms for clustering: a preliminary investigation , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[12]  Marco Dorigo,et al.  Ant-Based Clustering and Topographic Mapping , 2006, Artificial Life.

[13]  Ali M. S. Zalzala,et al.  A genetic rule-based data clustering toolkit , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[14]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[15]  Ajith Abraham,et al.  Swarm Intelligence Algorithms for Data Clustering , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[16]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  James C. Bezdek,et al.  Genetic algorithm guided clustering , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[18]  Donald R. Jones,et al.  Solving Partitioning Problems with Genetic Algorithms , 1991, International Conference on Genetic Algorithms.

[19]  Daniela Zaharie Density based clustering with crowding differential evolution , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).