Evolutionary clustering framework based on distance matrix for arbitrary-shaped data sets

Data clustering plays a key role in both scientific and real-world applications. However, current clustering methods still face some challenges such as clustering arbitrary-shaped data sets and detecting the cluster number automatically. This study addresses the two challenges. A novel clustering analysis method, named automatic evolutionary clustering method based on distance (AED) matrix, is proposed to determine the proper cluster number automatically, and to find the optimal clustering result as well. In AED, a distance matrix is first obtained by using a specific distance metric such as Euclidean distance metric or path distance metric, and then this distance matrix is partitioned by an evolutionary clustering framework. In this framework, a fixed-length representation scheme is implemented to represent the clustering result, a novel cross-over scheme is introduced to increase the convergence speed, and a validity index is proposed to evaluate the intermediate clustering results and the final clustering results. AED is systematically compared with some state-of-the-art clustering methods on both hyper-spherical and irregular-shaped data sets, and the experimental results suggest that the authors approach not only successfully detects the correct cluster numbers but also achieves better accuracy for most of test problems.

[1]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[2]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[3]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[5]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[6]  Haiqiao Huang,et al.  A robust adaptive clustering analysis method for automatic identification of clusters , 2012, Pattern Recognit..

[7]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[8]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[9]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[10]  Qingfu Zhang,et al.  Differential Evolution With Composite Trial Vector Generation Strategies and Control Parameters , 2011, IEEE Transactions on Evolutionary Computation.

[11]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[12]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[13]  Gian Luca Foresti,et al.  Kernel-based clustering , 2013 .

[14]  Du-Ming Tsai,et al.  Fuzzy C-means based clustering for linearly and nonlinearly separable data , 2011, Pattern Recognit..

[15]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[16]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[17]  Keying Ye,et al.  Determining the Number of Clusters Using the Weighted Gap Statistic , 2007, Biometrics.

[18]  Alain Bretto,et al.  A reductive approach to hypergraph clustering: An application to image segmentation , 2012, Pattern Recognit..

[19]  Aimin Zhou,et al.  Automatic clustering method based on evolutionary optimisation , 2013, IET Comput. Vis..

[20]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ujjwal Maulik,et al.  Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery , 2009, Pattern Recognit..

[22]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[23]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[25]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[26]  C. Bong,et al.  Multiobjective clustering with metaheuristic: current trends and methods in image segmentation , 2012 .

[27]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[28]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Qingfu Zhang,et al.  Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[30]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[31]  Maoguo Gong,et al.  Image texture classification using a manifold- distance-based evolutionary clustering method , 2008 .