Spectral clustering and semi-supervised learning using evolving similarity graphs

Graphical abstractDisplay Omitted HighlightsWe describe a spectral graph clustering method that aims to optimise a graph structure.The initial population is constructed using nearest neighbour graphs, defining a new way to convert a matrix to a one dimensional chromosome.We focus on evolving similarity graphs by applying fitness functions, based on clustering criteria.The proposed method is generic as it can be applied to all problems that can be modeled as graphs. Spectral graph clustering has become very popular in recent years, due to the simplicity of its implementation as well as the performance of the method, in comparison with other popular ones. In this article, we propose a novel spectral graph clustering method that makes use of genetic algorithms, in order to optimise the structure of a graph and achieve better clustering results. We focus on evolving the constructed similarity graphs, by applying a fitness function (also called objective function), based on some of the most commonly used clustering criteria. The construction of the initial population is based on nearest neighbour graphs, some variants of them and some arbitrary ones, represented as matrices. Each one of these matrices is transformed properly in order to form a chromosome and be used in the evolutionary process. The algorithm's performance greatly depends on the way that the initial population is created, as suggested by the various techniques that have been examined for the purposes of this article. The most important advantage of the proposed method is its generic nature, as it can be applied to several problems, that can be modeled as graphs, including clustering, dimensionality reduction and classification problems. Experiments have been conducted on a traditional dances dataset and on other various multidimensional datasets, using evaluation methods based on both internal and external clustering criteria, in order to examine the performance of the proposed algorithm, providing promising results.

[1]  O. Bousquet,et al.  Kernel methods and their potential use in signal processing , 2004, IEEE Signal Processing Magazine.

[2]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[7]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[8]  Michael Elad,et al.  Probabilistic Subspace Clustering Via Sparse Representations , 2013, IEEE Signal Processing Letters.

[9]  Israel Cohen,et al.  Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[11]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[12]  L. Zhou,et al.  Synchronization of separation and determination based on multichannel mode-filtered light detection with capillary electrophoresis. , 2001, The Analyst.

[13]  Zengyou He,et al.  k-ANMI: A mutual information based clustering algorithm for categorical data , 2005, Inf. Fusion.

[14]  Kenneth Alan De Jong,et al.  An analysis of the behavior of a class of genetic adaptive systems. , 1975 .

[15]  Ricardo J. G. B. Campello,et al.  On the Comparison of Relative Clustering Validity Criteria , 2009, SDM.

[16]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[17]  G. Karypis,et al.  Criterion functions for document clustering , 2005 .

[18]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[19]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[20]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[21]  Satyam Maheswari,et al.  Survey of Recent Clustering Techniques in Data Mining , 2012 .

[22]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[23]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[24]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[25]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[26]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[27]  Jesmin F. Khan,et al.  Image Segmentation and Shape Analysis for Road-Sign Detection , 2011, IEEE Transactions on Intelligent Transportation Systems.

[28]  Anastasios Tefas,et al.  Feature Comparison and Feature Fusion for Traditional Dances Recognition , 2013, EANN.

[29]  Xavier Anguera Miró,et al.  Speed improvements to Information Retrieval-based dynamic time warping using hierarchical K-Means clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[31]  Langis Gagnon,et al.  Automatic Detection and Clustering of Actor Faces based on Spectral Clustering Techniques , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[32]  Srinivasan Parthasarathy,et al.  Identifying functional modules in interaction networks through overlapping Markov clustering , 2012, Bioinform..

[33]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[34]  Ali Peiravi,et al.  An optimal energy‐efficient clustering method in wireless sensor networks using multi‐objective genetic algorithm , 2013, Int. J. Commun. Syst..

[35]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[36]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  D. Saravanan,et al.  A Proposed New Algorithm for Hierarchical Clustering Suitable for Video Data Mining , 2011 .

[38]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[39]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[40]  Alexandros Iosifidis,et al.  Minimum Class Variance Extreme Learning Machine for Human Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[43]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[44]  Benno Stein,et al.  On Cluster Validity and the Information Need of Users , 2003 .

[45]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[47]  Wei Liu,et al.  Semi-supervised distance metric learning for Collaborative Image Retrieval , 2008, CVPR.

[48]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[49]  Maria-Florina Balcan,et al.  Person Identification in Webcam Images: An Application of Semi-Supervised Learning , 2005 .