A Hybrid Approach to Data Clustering Analysis with K-Means and Enhanced Ant-Based Template Mechanism

Data clustering algorithms play an important role in effective analysis and organization of massive amounts of information. The K-means algorithm is the most commonly used partitional data clustering algorithm because of its simplicity in implementation and its high convergence rate. However, it suffers from the inability to always converge to the global optima, depending on how the data items are distributed initially. Ant-based Template Mechanism (Ant_TM) is another frequently used clustering algorithm, but it exhibits two major weaknesses in convergence rate and data purity of clustering results. In this paper, we first present a modification to the original Ant_TM to encourage formation of new cluster regions that enables the clustering result to move away from local optima. Second, we present two hybrid clustering algorithms based on the enhanced Ant-based Template Mechanism (Ant_TM) and the K-means algorithms. The rationale is that the integration of the K-means algorithm can speed up the convergence process and provide a perturbance to break free from local optimum clustering. We conduct experiments to compare the performance of our hybrid algorithms, against the enhanced Ant TM and the K-means algorithm, as well as the PSO+K and GA. The result shows that our algorithms outperform the original Ant_TM, K-means, and PSO+K, and is competitive against the GA in terms of the more compact and better separated clusters.

[1]  Liu Shang,et al.  The K-means clustering algorithm based on density and ant colony , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[2]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[3]  Wu Bin,et al.  CSIM: a document clustering algorithm based on swarm intelligence , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[4]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[5]  Xiaohui Cui,et al.  Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm , 2005 .

[6]  Z. Geem Music-Inspired Harmony Search Algorithm: Theory and Applications , 2009 .

[7]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[8]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  NIGEL R FRANKS,et al.  Self-organizing nest construction in ants: individual worker behaviour and the nest's dynamics , 1997, Animal Behaviour.

[10]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[11]  John H. Miller,et al.  Complex adaptive systems - an introduction to computational models of social life , 2009, Princeton studies in complexity.

[12]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[13]  Nicolas Monmarché,et al.  AntClust: Ant Clustering and Web Usage Mining , 2003, GECCO.

[14]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[15]  Mohammad Reza Meybodi,et al.  Hybridization of K-Means and Harmony Search Methods for Web Page Clustering , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Julia Handl,et al.  Improved Ant-Based Clustering and Sorting , 2002, PPSN.

[17]  Zong Woo Geem,et al.  Music-Inspired Harmony Search Algorithm , 2009 .

[18]  Marta Prim,et al.  Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down's Syndrome Detection , 2009, Mining Complex Data.

[19]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[20]  Ali M. S. Zalzala,et al.  A genetic rule-based data clustering toolkit , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[21]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[22]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[23]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[24]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..