Automatic Data Clustering Analysis of Arbitrary Shape with K-Means and Enhanced Ant-Based Template Mechanism

With the advancement of miniature sensors, wireless networking and context awareness, the importance of data-intensive computing is on the rise, with practical applications such as web categorization and data mining. One of the critical challenges in data-intensive computing is data clustering, as effective clustering algorithm will enable researchers and automated systems to analyze and organize massive amount of data much more efficiently. Many data clustering algorithms already exist, but most require a priori knowledge on the number of classes to guide the clustering process. We propose Auto_Ant_TMs_Shape, a two-phase algorithm, for automatically forming optimal number of clusters with arbitrary shapes. The first phase uses the hybrid approach of K-means and enhanced Ant-based template mechanism to generate small seed clusters with high purity in each cluster. In the second phase, small clusters are iteratively merged to obtain the final clusters using a merging algorithm. We apply Auto_Ant_TMs_Shape to 8 widely-used datasets, and compare the clustering results with two approaches based on density-based algorithm (DBSCAN) and Particle Swarm Optimization (PSO). The results show that Auto_Ant_TMs_Shape is very effective and thus achieve good clustering results in near optimal number of clusters without knowing the number of classes in advance.

[1]  Swagatam Das,et al.  Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm , 2010, Inf. Sci..

[2]  Wei Zhang,et al.  A Hybrid Approach to Data Clustering Analysis with K-Means and Enhanced Ant-Based Template Mechanism , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Amit Konar,et al.  Kernel based automatic clustering using modified particle swarm optimization algorithm , 2007, GECCO '07.

[4]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[5]  Amit Konar,et al.  Metaheuristic Clustering , 2009, Studies in Computational Intelligence.

[6]  Mohammad Al Hasan,et al.  SPARCL: Efficient and Effective Shape-Based Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[8]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[9]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  NIGEL R FRANKS,et al.  Self-organizing nest construction in ants: individual worker behaviour and the nest's dynamics , 1997, Animal Behaviour.

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.