Covariate-assisted spectral clustering

Summary Biological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the misclustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior both to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.

[1]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[3]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[4]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[5]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[6]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[7]  Ernest Valveny,et al.  Graph embedding in vector spaces by node attribute statistics , 2012, Pattern Recognit..

[8]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[9]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[10]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[13]  Lothar Reichel,et al.  Restarted block Lanczos bidiagonalization methods , 2007, Numerical Algorithms.

[14]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[15]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[16]  Thomas Seidl,et al.  Spectral Subspace Clustering for Graphs with Feature Vectors , 2013, 2013 IEEE 13th International Conference on Data Mining.

[17]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[18]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[19]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[20]  Davide Eynard,et al.  Multimodal diffusion geometry by joint diagonalization of Laplacians , 2012, ArXiv.

[21]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[22]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[23]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[24]  Fei Wang,et al.  Integrated KL (K-means - Laplacian) Clustering: A New Clustering Approach by Combining Attribute Data and Pairwise Relations , 2009, SDM.

[25]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[26]  Keith Heberlein,et al.  Imaging human connectomes at the macroscale , 2013, Nature Methods.

[27]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[28]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[29]  Davide Eynard,et al.  Multimodal Manifold Analysis by Simultaneous Diagonalization of Laplacians , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[31]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .