Compressive Spectral Clustering

Spectral clustering has become a popular technique due to its high performance in many contexts. It comprises three main steps: create a similarity graph between N objects to cluster, compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object, and run k-means on these features to separate objects into k classes. Each of these three steps becomes computationally intensive for large N and/or k. We propose to speed up the last two steps based on recent results in the emerging field of graph signal processing: graph filtering of random signals, and random sampling of bandlimited graph signals. We prove that our method, with a gain in computation time that can reach several orders of magnitude, is in fact an approximation of spectral clustering, for which we are able to control the error. We test the performance of our method on artificial and real-world network data.

[1]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  M. Cugmas,et al.  On comparing partitions , 2015 .

[3]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[4]  Pierre Vandergheynst,et al.  GSPBOX: A toolbox for signal processing on graphs , 2014, ArXiv.

[5]  Upamanyu Madhow,et al.  Compressive spectral embedding: sidestepping the SVD , 2015, NIPS.

[6]  Pierre Vandergheynst,et al.  Random sampling of bandlimited signals on graphs , 2015, NIPS 2015.

[7]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[8]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  S DhillonInderjit,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007 .

[11]  Sergio Barbarossa,et al.  Uncertainty principle and sampling of signals defined on graphs , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[15]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[16]  Jean-Loup Guillaume,et al.  Stable Community Cores in Complex Networks , 2012, CompleNet.

[17]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[18]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[19]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[20]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[22]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Pascal Frossard,et al.  Chebyshev polynomial approximation for distributed signal processing , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[24]  Michael B. Cohen,et al.  Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[25]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[26]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[27]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[28]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[29]  Christos Boutsidis,et al.  Stochastic Dimensionality Reduction for K-means Clustering , 2011, ArXiv.

[30]  Sergio Barbarossa,et al.  Signals on Graphs: Uncertainty Principle and Sampling , 2015, IEEE Transactions on Signal Processing.

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Spectral methods for graph clustering - A survey , 2011, Eur. J. Oper. Res..

[32]  Antonio Ortega,et al.  Submitted to Ieee Transactions on Signal Processing 1 Efficient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies , 2022 .

[33]  Edoardo Di Napoli,et al.  Efficient estimation of eigenvalue counts in an interval , 2013, Numer. Linear Algebra Appl..

[34]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[35]  Atsushi Imiya,et al.  Fast Spectral Clustering with Random Projection and Sampling , 2009, MLDM.

[36]  Christos Boutsidis,et al.  Spectral Clustering via the Power Method - Provably , 2013, ICML.

[37]  Santiago Segarra,et al.  Sampling of Graph Signals With Successive Local Aggregations , 2015, IEEE Transactions on Signal Processing.

[38]  Jelena Kovacevic,et al.  Discrete Signal Processing on Graphs: Sampling Theory , 2015, IEEE Transactions on Signal Processing.

[39]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[40]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[41]  Pierre Vandergheynst,et al.  Accelerated spectral clustering using graph filtering of random signals , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Kotagiri Ramamohanarao,et al.  Approximate Spectral Clustering , 2009, PAKDD.

[43]  Bo Hu,et al.  Infinite Impulse Response Graph Filters in Wireless Sensor Networks , 2015, IEEE Signal Processing Letters.

[44]  Tao Qin,et al.  Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization , 2007, ECIR.

[45]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[46]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[48]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.