Analysis of spectral clustering algorithms for community detection: the general bipartite setting

We consider the analysis of spectral clustering algorithms for community detection under a stochastic block model (SBM). A general spectral clustering algorithm consists of three steps: (1) regularization of an appropriate adjacency or Laplacian matrix (2) a form of spectral truncation and (3) a k-means type algorithm in the reduced spectral domain. By varying each step, one can obtain different spectral algorithms. In light of the recent developments in refining consistency results for the spectral clustering, we identify the necessary bounds at each of these three steps, and then derive and compare consistency results for some existing spectral algorithms as well as a new variant that we propose. The focus of the paper is on providing a better understanding of the analysis of spectral methods for community detection, with an emphasis on the bipartite setting which has received less theoretical consideration. We show how the variations in the spectral truncation step reflects in the consistency results under a general SBM. We also investigate the necessary bounds for the k-means step in some detail, allowing one to replace this step with any algorithm (k-means type or otherwise) that guarantees the necessary bound. We discuss some of the neglected aspects of the bipartite setting, e.g., the role of the mismatch between the communities of the two sides on the performance of spectral methods. Finally, we show how the consistency results can be extended beyond SBMs to the problem of clustering inhomogeneous random graph models that can be approximated by SBMs in a certain sense.

[1]  R. Bhatia Matrix Analysis , 1996 .

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[4]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[5]  Can M. Le,et al.  Concentration and regularization of random graphs , 2015, Random Struct. Algorithms.

[6]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[7]  Alexandre Proutière,et al.  Community Detection via Random and Adaptive Sampling , 2014, COLT.

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[10]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[11]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[12]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[13]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[14]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[15]  Bin Yu,et al.  Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim , 2012, 1204.2296.

[16]  Laurent Massoulié,et al.  Distributed user profiling via spectral methods , 2014 .

[17]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[18]  Jiaming Xu,et al.  Rates of Convergence of Spectral Methods for Graphon Estimation , 2017, ICML.

[19]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[20]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[21]  Carey E. Priebe,et al.  On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs , 2018, Network Science.

[22]  Carey E. Priebe,et al.  Consistent Adjacency-Spectral Partitioning for the Stochastic Block Model When the Model Parameters Are Unknown , 2012, SIAM J. Matrix Anal. Appl..

[23]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[24]  B. Söderberg General formalism for inhomogeneous random graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[26]  A. Tsybakov,et al.  Oracle inequalities for network models and sparse graphon estimation , 2015, 1507.04118.

[27]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[28]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[29]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[30]  Zhixin Zhou,et al.  Optimal Bipartite Network Clustering , 2018, J. Mach. Learn. Res..

[31]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Anirban Dasgupta,et al.  Spectral Clustering by Recursive Partitioning , 2006, ESA.

[33]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[34]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[35]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[36]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[37]  B. Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[38]  Can M. Le,et al.  Sparse random graphs: regularization and concentration of the Laplacian , 2015, ArXiv.

[39]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[40]  C. Priebe,et al.  Perfect Clustering for Stochastic Blockmodel Graphs via Adjacency Spectral Embedding , 2013, 1310.0532.

[41]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[42]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[43]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[44]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[45]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[46]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[47]  Sivaraman Balakrishnan,et al.  Noise Thresholds for Spectral Clustering , 2011, NIPS.

[48]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[49]  Chao Gao,et al.  Community Detection in Degree-Corrected Block Models , 2016, The Annals of Statistics.

[50]  Wuyi Wang,et al.  Strong Consistency of Spectral Clustering for Stochastic Block Models , 2017, IEEE Transactions on Information Theory.

[51]  YU BIN,et al.  IMPACT OF REGULARIZATION ON SPECTRAL CLUSTERING , 2016 .

[52]  Jianqing Fan,et al.  ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK. , 2017, Annals of statistics.

[53]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.