A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem

In this paper, a genetic clustering algorithm based on dynamic niching with niche migration (DNNM-clustering) is proposed. It is an effective and robust approach to clustering on the basis of a similarity function relating to the approximate density shape estimation. In the new algorithm, a dynamic identification of the niches with niche migration is performed at each generation to automatically evolve the optimal number of clusters as well as the cluster centers of the data set without invoking cluster validity functions. The niches can move slowly under the migration operator which makes the dynamic niching method independent of the radius of the niches. Compared to other existing methods, the proposed clustering method exhibits the following robust characteristics: (1) robust to the initialization, (2) robust to clusters volumes (ability to detect different volumes of clusters), and (3) robust to noise. Moreover, it is free of the radius of the niches and does not need to pre-specify the number of clusters. Several data sets with widely varying characteristics are used to demonstrate its superiority. An application of the DNNM-clustering algorithm in unsupervised classification of the multispectral remote sensing image is also provided.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  Z. Hubálek COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE‐ABSENCE) DATA: AN EVALUATION , 1982 .

[3]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Xianda Zhang,et al.  A genetic algorithm with gene rearrangement for K-means clustering , 2009, Pattern Recognit..

[5]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Alan S. Perelson,et al.  Searching for Diverse, Cooperative Populations with Genetic Algorithms , 1993, Evolutionary Computation.

[7]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[8]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[9]  P. John Clarkson,et al.  A Species Conserving Genetic Algorithm for Multimodal Function Optimization , 2002, Evolutionary Computation.

[10]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[11]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[12]  Xinhua Zhuang,et al.  Gaussian mixture density modeling, decomposition, and applications , 1996, IEEE Trans. Image Process..

[13]  Michael J. Laszlo,et al.  A genetic algorithm that exchanges neighboring centers for k-means clustering , 2007, Pattern Recognit. Lett..

[14]  Allan Tucker,et al.  RGFGA: An Efficient Representation and Crossover for Grouping Genetic Algorithms , 2005, Evolutionary Computation.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[18]  Sanghamitra Bandyopadhyay,et al.  A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[19]  J. Huxley,et al.  Systematics and the Origin of Species from the Viewpoint of a Zoologist , 1943 .

[20]  Xin Yao,et al.  Every Niching Method has its Niche: Fitness Sharing and Implicit Sharing Compared , 1996, PPSN.

[21]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[22]  Claudio De Stefano,et al.  Where Are the Niches? Dynamic Fitness Sharing , 2007, IEEE Transactions on Evolutionary Computation.

[23]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[24]  Michael J. Laszlo,et al.  A Genetic Algorithm Using Hyper-Quadtrees , 2006 .

[25]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  B. Everitt,et al.  A Monte Carlo Study of the Recovery of Cluster Structure in Binary Data by Hierarchical Clustering Techniques. , 1987, Multivariate behavioral research.

[28]  Raghu Krishnapuram,et al.  Fitting an unknown number of lines and planes to image data through compatible cluster merging , 1992, Pattern Recognit..

[29]  Kalyanmoy Deb,et al.  An Investigation of Niche and Species Formation in Genetic Function Optimization , 1989, ICGA.

[30]  Miin-Shen Yang,et al.  A similarity-based robust clustering method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[32]  Hichem Frigui,et al.  Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. II , 1995, IEEE Trans. Fuzzy Syst..

[33]  Samir W. Mahfoud Genetic drift in sharing methods , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[34]  Alan S. Perelson,et al.  Using Genetic Algorithms to Explore Pattern Recognition in the Immune System , 1993, Evolutionary Computation.

[35]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[36]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[37]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[38]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[39]  Roy George,et al.  A variable-length genetic algorithm for clustering and classification , 1995, Pattern Recognit. Lett..

[40]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[41]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[42]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[43]  Zbigniew Michalewicz,et al.  Genetic Algorithms Plus Data Structures Equals Evolution Programs , 1994 .

[44]  Ralph R. Martin,et al.  A Sequential Niche Technique for Multimodal Function Optimization , 1993, Evolutionary Computation.

[45]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (2nd, extended ed.) , 1994 .

[46]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[47]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[48]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[49]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[50]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[51]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  D. A. Preece,et al.  Identification Keys and Diagnostic Tables: a Review , 1980 .

[53]  Michael J. Shaw,et al.  Genetic algorithms with dynamic niche sharing for multimodal function optimization , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[54]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[55]  William M. Spears,et al.  Simple Subpopulation Schemes , 1998 .

[56]  Samir W. Mahfoud Population Size and Genetic Drift in Fitness Sharing , 1994, FOGA.

[57]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[58]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .