Accelerated spectral clustering using graph filtering of random signals

We build upon recent advances in graph signal processing to propose a faster spectral clustering algorithm. Indeed, classical spectral clustering is based on the computation of the first k eigenvectors of the similarity matrix' Laplacian, whose computation cost, even for sparse matrices, becomes prohibitive for large datasets. We show that we can estimate the spectral clustering distance matrix without computing these eigenvectors: by graph filtering random signals. Also, we take advantage of the stochasticity of these random vectors to estimate the number of clusters k. We compare our method to classical spectral clustering on synthetic data, and show that it reaches equal performance while being faster by a factor at least two for large datasets.

[1]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[2]  M. Cugmas,et al.  On comparing partitions , 2015 .

[3]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[4]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Patrice Abry,et al.  Multiscale anisotropic texture unsupervised clustering for photographic paper , 2015, 2015 IEEE International Workshop on Information Forensics and Security (WIFS).

[6]  Pierre Borgnat,et al.  Multiscale community mining in networks using the graph wavelet transform of random vectors , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Pierre Vandergheynst,et al.  Spectrum-Adapted Tight Graph Wavelet and Vertex-Frequency Frames , 2013, IEEE Transactions on Signal Processing.

[9]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Pascal Frossard,et al.  Chebyshev polynomial approximation for distributed signal processing , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[12]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[13]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[14]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[15]  Hongjie Jia,et al.  The latest research progress on spectral clustering , 2013, Neural Computing and Applications.

[16]  Pierre Vandergheynst,et al.  GSPBOX: A toolbox for signal processing on graphs , 2014, ArXiv.

[17]  Pierre Borgnat,et al.  Graph Wavelets for Multiscale Community Mining , 2014, IEEE Transactions on Signal Processing.

[18]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[19]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[20]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[21]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[22]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[23]  Upamanyu Madhow,et al.  Compressive spectral embedding: sidestepping the SVD , 2015, NIPS.