Efficient eigen-updating for spectral graph clustering

Partitioning a graph into groups of vertices such that those within each group are more densely connected than vertices assigned to different groups, known as graph clustering, is often used to gain insight into the organisation of large scale networks and for visualisation purposes. Whereas a large number of dedicated techniques have been recently proposed for static graphs, the design of on-line graph clustering methods tailored for evolving networks is a challenging problem, and much less documented in the literature. Motivated by the broad variety of applications concerned, ranging from the study of biological networks to the analysis of networks of scientific references through the exploration of communications networks such as the World Wide Web, it is the main purpose of this paper to introduce a novel, computationally efficient, approach to graph clustering in the evolutionary context. Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clustering method described by Ng et al. (2001) [25]. The incremental eigenvalue solution is a general technique for finding the approximate eigenvectors of a symmetric matrix given a change. As well as outlining the approach in detail, we present a theoretical bound on the quality of the approximate eigenvectors using perturbation theory. We then derive a novel spectral clustering algorithm called Incremental Approximate Spectral Clustering (IASC). The IASC algorithm is simple to implement and its efficacy is demonstrated on both synthetic and real datasets modelling the evolution of a HIV epidemic, a citation network and the purchase history graph of an e-commerce website.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  W MahoneyMichael,et al.  Fast Monte Carlo Algorithms for Matrices III , 2006 .

[3]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[5]  Hongyuan Zha,et al.  On Updating Problems in Latent Semantic Indexing , 1997, SIAM J. Sci. Comput..

[6]  Rong Jin,et al.  An Improved Bound for the Nystrom Method for Large Eigengap , 2012, ArXiv.

[7]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[9]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[10]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[11]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[12]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13]  Ye Tian,et al.  A Fast Incremental Spectral Clustering for Large Data Sets , 2011, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[14]  Gavin W. O''Brien,et al.  Information Management Tools for Updating an SVD-Encoded Indexing Scheme , 1994 .

[15]  Stéphan Clémençon,et al.  Incremental Spectral Clustering with the Normalised Laplacian , 2011, NIPS 2011.

[16]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[17]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Haitao Zhao,et al.  A novel incremental principal component analysis and its application for face recognition , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[21]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[22]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[23]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[26]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[27]  Willem H. Haemers,et al.  Enumeration of cospectral graphs , 2004, Eur. J. Comb..

[28]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[29]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[30]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[31]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[32]  Haitao Zhao,et al.  Incremental eigen decomposition , 2003 .

[33]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[34]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[35]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[36]  Ameet Talwalkar,et al.  Sampling Techniques for the Nystrom Method , 2009, AISTATS.

[37]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[38]  Stéphan Clémençon,et al.  The HIV/AIDS epidemic in Cuba: description and tentative explanation of its low HIV prevalence , 2007, BMC infectious diseases.

[39]  Theodoros Evgeniou,et al.  Link Discovery using Graph Feature Tracking , 2010, NIPS.

[40]  Seungjin Choi,et al.  Nyström Approximations for Scalable Face Recognition: A Comparative Study , 2011, ICONIP.

[41]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[42]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[43]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[44]  Tom Duckett,et al.  Incremental Spectral Clustering and Its Application To Topological Mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[45]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[46]  James T. Kwok,et al.  Time and space efficient spectral clustering via column sampling , 2011, CVPR 2011.