Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure

Cluster analysis is an unsupervised learning method that constitutes a cornerstone of an intelligent data analysis process. Clustering categorical data is an important research area data mining. In this paper we propose a novel algorithm to cluster categorical data. Based on the minimum dissimilarity value objects are grouped into cluster. In the merging process, the objects are relocated using silhouette coefficient. Experimental results show that the proposed method is efficient.

[1]  R. Eberhart,et al.  Comparing inertia weights and constriction factors in particle swarm optimization , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[2]  Pieter Adriaans,et al.  Data mining , 1996 .

[3]  R. Eberhart,et al.  Fuzzy adaptive particle swarm optimization , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[4]  Andries Petrus Engelbrecht,et al.  Fundamentals of Computational Swarm Intelligence , 2005 .

[5]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[6]  Yoshikazu Fukuyama,et al.  Practical distribution state estimation using hybrid particle swarm optimization , 2001, 2001 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No.01CH37194).

[7]  Kalyan Veeramachaneni,et al.  Fitness-distance-ratio based particle swarm optimization , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[8]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[9]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[10]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[11]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[12]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[13]  M. Clerc,et al.  The swarm and the queen: towards a deterministic and adaptive particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[14]  Hirotaka Yoshida,et al.  A particle swarm optimization for reactive power and voltage control in electric power systems considering voltage security assessment , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[15]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[16]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[17]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[18]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  James Kennedy,et al.  The particle swarm: social adaptation of knowledge , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[21]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.