Automatic Clustering Using an Improved Particle Swarm Optimization

Unsupervised data clustering is an important analysis in data mining. Many clustering algorithms have been proposed, yet most of them require predefined number of clusters. Unfortunately, unavailable information regarding number of clusters is commonly happened in real-world problems. Thus, this paper intends to overcome this problem by proposing an algorithm for automatic clustering. The proposed algorithm is developed based on a population-based heuristic method named particle swarm optimization (PSO). It overcomes two main issues in automatic clustering, namely determining number of clusters and cluster centroid. In the automatic clustering using PSO (ACPSO), the exploration is conducted by particles comprising of two sections. Herein, time-varying tuning parameter is applied. Furthermore, sigmoid function is employed to handle infeasible solution. In addition, K-means is applied to adjust the cluster centroids. Method validation using four benchmark datasets reveals that TPSO outperforms other two previous methods namely DCPSO, DCPG, and DCGA. Overall, ACPSO has better accuracy and consistency. 

[1]  Hong He,et al.  A two-stage genetic algorithm for automatic clustering , 2012, Neurocomputing.

[2]  Andries Petrus Engelbrecht,et al.  Dynamic clustering using particle swarm optimization with application in image segmentation , 2006, Pattern Analysis and Applications.

[3]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[4]  Kuo-Sheng Cheng,et al.  Evolution-Based Tabu Search Approach to Automatic Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[6]  Russell C. Eberhart,et al.  Parameter Selection in Particle Swarm Optimization , 1998, Evolutionary Programming.

[7]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[8]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[9]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[10]  R. J. Kuo,et al.  Integration of particle swarm optimization and genetic algorithm for dynamic clustering , 2012, Inf. Sci..

[11]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[12]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[13]  Chunguang Zhou,et al.  Fuzzy discrete particle swarm optimization for solving traveling salesman problem , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[16]  Xindong Wu,et al.  Automatic clustering using genetic algorithms , 2011, Appl. Math. Comput..

[17]  Xianda Zhang,et al.  A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem , 2010, Pattern Recognit..

[18]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..