Improved Particle Swarm Optimization Based K-Means Clustering

Clustering is a popular data analysis and data mining technique. K-Means is one of the most popular data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. The major issue of traditional K-Means algorithm is that its performance depends on the initialization of centroid and requires the number of clusters to be specified in advance. Many evolutionary based clustering algorithms have been developed in recent years for selecting optimum initial centroid to optimize clustering results. Particle Swarm Optimization algorithm is a population-based memetic-evolution-motivated meta-heuristic algorithm that mimics the capability of swarm. The K-Means algorithm typically uses Euclidean or squared Euclidean distance to measure the distortion between a data object and its cluster centroid. The Euclidean and squared Euclidean distances are usually computed from raw data and not from standardized data. Normalization is one of the important preprocessing steps, to transform values of all attributes. Effective data clustering can only occur if an equally effective technique for normalizing the data is applied. This paper proposes an effective partitional clustering algorithm which is developed by integrating the merits of Particle Swarm Optimization and normalization with traditional K-Means clustering algorithms. Experiments are conducted on real dataset to prove the efficiency of the proposed algorithm.

[1]  Jian Xiao,et al.  A novel chaotic particle swarm optimization based fuzzy clustering algorithm , 2012, Neurocomputing.

[2]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[3]  N. Karthikeyani Visalakshi,et al.  K-means clustering using Max-min distance measure , 2009, NAFIPS 2009 - 2009 Annual Meeting of the North American Fuzzy Information Processing Society.

[4]  Andrew Chi-Sing Leung,et al.  PSO-based K-Means clustering with enhanced cluster matching for gene expression data , 2012, Neural Computing and Applications.

[5]  HalkidiMaria,et al.  Cluster validity methods , 2002 .

[6]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[9]  H. Modares,et al.  Combining PSO and k-means to enhance data clustering , 2008, 2008 International Symposium on Telecommunications.

[10]  Henri Luchian,et al.  PSO aided k-means clustering: introducing connectivity in k-means , 2011, GECCO '11.

[11]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[12]  Rajesh Kumar,et al.  A review on particle swarm optimization algorithms and their applications to data clustering , 2011, Artificial Intelligence Review.

[13]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[16]  K. Shanti Swarup,et al.  Particle swarm optimization based K-means clustering approach for security assessment in power systems , 2011, Expert Syst. Appl..