A Genetic Graph-Based Clustering Algorithm

The interest in the analysis and study of clustering techniques have grown since the introduction of new algorithms based on the continuity of the data, where problems related to image segmentation and tracking, amongst others, makes difficult the correct classification of data into their appropriate groups, or clusters. Some new techniques, such as Spectral Clustering (SC), uses graph theory to generate the clusters through the spectrum of the graph created by a similarity function applied to the elements of the database. The approach taken by SC allows to handle the problem of data continuity though the graph representation. Based on this idea, this study uses genetic algorithms to select the groups using the same similarity graph built by the Spectral Clustering method. The main contribution is to create a new algorithm which improves the robustness of the Spectral Clustering algorithm reducing the dependency of the similarity metric parameters that currently affects to the performance of SC approaches. This algorithm, named Genetic Graph-based Clustering (GGC), has been tested with different synthetic and real-world datasets, the experimental results have been compared against classical clustering algorithms like K-Means, EM and SC.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[3]  Victor J. Rayward-Smith,et al.  Intelligent Data Engineering and Automated Learning - IDEAL 2011 , 2011, Lecture Notes in Computer Science.

[4]  David Coley,et al.  Introduction to Genetic Algorithms for Scientists and Engineers , 1999 .

[5]  Witold Pedrycz,et al.  Unsupervised Learning: Clustering , 2007 .

[6]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  David Camacho,et al.  Using the Clustering Coefficient to Guide a Genetic-Based Communities Finding Algorithm , 2011, IDEAL.

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[10]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[11]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[13]  Huiqing Wang,et al.  A Genetic Spectral Clustering Algorithm , 2011 .

[14]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..