Large-Scale Spectral Clustering on Graphs

Graph clustering has received growing attention in recent years as an important analytical technique, both due to the prevalence of graph data, and the usefulness of graph structures for exploiting intrinsic data characteristics. However, as graph data grows in scale, it becomes increasingly more challenging to identify clusters. In this paper we propose an efficient clustering algorithm for large-scale graph data using spectral methods. The key idea is to repeatedly generate a small number of "supernodes" connected to the regular nodes, in order to compress the original graph into a sparse bipartite graph. By clustering the bipartite graph using spectral methods, we are able to greatly improve efficiency without losing considerable clustering power. Extensive experiments show the effectiveness and efficiency of our approach.

[1]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[5]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[6]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[7]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Minoru Sasaki,et al.  Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size , 2008, LREC.

[11]  Atsushi Imiya,et al.  Fast Spectral Clustering with Random Projection and Sampling , 2009, MLDM.

[12]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[14]  Nguyen Lu Dang Khoa,et al.  Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers , 2012, Discovery Science.

[15]  Tao Qin,et al.  Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization , 2007, ECIR.

[16]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[17]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[18]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[19]  Tie-Yan Liu,et al.  Fast Spectral Clustering of Data Using Sequential Matrix Compression , 2006, ECML.