NK Hybrid Genetic Algorithm for Clustering

The NK hybrid genetic algorithm (GA) for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of <inline-formula> <tex-math notation="LaTeX">${N}$ </tex-math></inline-formula> small groups of objects. Each group is composed of <inline-formula> <tex-math notation="LaTeX">${K+1}$ </tex-math></inline-formula> objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small <inline-formula> <tex-math notation="LaTeX">${K}$ </tex-math></inline-formula>. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover (PX), and a local search strategy are proposed, all using information about the relationship between decision variables. In PX, the evaluation function is decomposed into <inline-formula> <tex-math notation="LaTeX">${q}$ </tex-math></inline-formula> independent components; PX then deterministically returns the best among <inline-formula> <tex-math notation="LaTeX">${2^{q}}$ </tex-math></inline-formula> possible offspring with computational complexity <inline-formula> <tex-math notation="LaTeX">${O(N)}$ </tex-math></inline-formula>. The NK hybrid GA allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid GA produced very good results when compared to another GA approach and to state-of-art clustering algorithms.

[1]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[2]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Arthur Zimek,et al.  Density-Based Clustering Validation , 2014, SDM.

[4]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[5]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Pablo A. Jaskowiak On the evaluation of clustering results: measures, ensembles, and gene expression data analysis , 2015 .

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[11]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[12]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[13]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[14]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[15]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[16]  L. Darrell Whitley,et al.  Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization , 2016, EvoCOP.

[17]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[18]  L. Darrell Whitley,et al.  Gray Box Optimization for Mk Landscapes (NK Landscapes and MAX-kSAT) , 2016, Evolutionary Computation.

[19]  L. Darrell Whitley,et al.  Partition Crossover for Pseudo-Boolean Optimization , 2015, FOGA.

[20]  Doug Hains,et al.  Tunneling between optima: partition crossover for the traveling salesman problem , 2009, GECCO.

[21]  A. Zimek,et al.  On Using Class-Labels in Evaluation of Clusterings , 2010 .

[22]  L. Darrell Whitley,et al.  A New Evaluation Function for Clustering: The NK Internal Validation Criterion , 2016, GECCO.

[23]  Ickjai Lee,et al.  Cluster Validity Through Graph-based Boundary Analysis , 2004, IKE.

[24]  L. Darrell Whitley,et al.  Efficient Hill Climber for Constrained Pseudo-Boolean Optimization Problems , 2016, GECCO.

[25]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .