Optimal Clustering Method Based on Genetic Algorithm

Clustering methods divide the dataset into groups called clusters such that the objects in the same cluster are more similar and objects in the different clusters are dissimilar. Clustering algorithms can be hierarchical or partitional. Partitional clustering methods decompose the dataset into set of disjoint clusters. Most partitional approaches assume that the number of clusters are known a priori. Moreover, they are sensitive to initialization. Hierarchical clustering methods produce a complete sequence of clustering solutions, either from singleton clusters to a cluster including all individuals or vice versa. Hierarchical clustering can be represented by help of a dendrogram that can be cut at different levels to obtain different number of clusters of corresponding granularities. If dataset has large multilevel hierarchies then it becomes difficult to determine optimal clustering by cutting the dendrogram at every level and validating clusters obtained for each level. Genetic Algorithms (GAs) have proven to be a promising technique for solving complex optimization problems. In this paper, we propose an Optimal Clustering Genetic Algorithm (OCGA) to find optimal number of clusters. The proposed method has been applied on some artificially generated datasets. It has been observed that it took less number of iterations of cluster validation to arrive at optimal number of clusters.

[1]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[2]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[3]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[4]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[5]  Xue-Ming Li,et al.  A hybrid genetic based clustering algorithm , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[6]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[7]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[8]  Mukesh M. Raghuwanshi,et al.  Genetic Algorithm Based Clustering: A Survey , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[9]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[10]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Mao Ye,et al.  Finding the optimal number of clusters using genetic algorithms , 2008, 2008 IEEE Conference on Cybernetics and Intelligent Systems.

[14]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[15]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Hwei-Jen Lin,et al.  An Efficient GA-based Clustering Technique , 2005 .

[19]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[20]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[21]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[22]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..