Community detection and stochastic block models: recent developments

The stochastic block model (SBM) is a random graph model with planted clusters. It is widely employed as a canonical model to study clustering and community detection, and provides generally a fertile ground to study the statistical and computational tradeoffs that arise in network and data sciences. This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. A few open problems are also discussed.

[1]  B. Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[2]  Can M. Le,et al.  Sparse random graphs: regularization and concentration of the Laplacian , 2015, ArXiv.

[3]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[4]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Christian Borgs,et al.  Private Graphon Estimation for Sparse Graphs , 2015, NIPS.

[6]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[7]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  William W. Cohen,et al.  Community-Based Recommendations: a Solution to the Cold Start Problem , 2011 .

[9]  Emmanuel Abbe,et al.  Graph compression: The effect of clusters , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[11]  Frank Thomson Leighton,et al.  Graph Bisection Algorithms with Good Average Case Behavior , 1984, FOCS.

[12]  X ZhengAlice,et al.  A Survey of Statistical Network Models , 2010 .

[13]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[14]  Marc Lelarge,et al.  Recovering Asymmetric Communities in the Stochastic Block Model , 2018, IEEE Transactions on Network Science and Engineering.

[15]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[16]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[17]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[18]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[19]  Van H. Vu,et al.  Spectral norm of random matrices , 2005, STOC '05.

[20]  Emmanuel Abbe,et al.  Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters , 2015, NIPS.

[21]  C. Borgs,et al.  Consistent nonparametric estimation for heavy-tailed sparse graphs , 2015, The Annals of Statistics.

[22]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[23]  Thomas Bonald,et al.  A spectral algorithm with additive clustering for the recovery of overlapping communities in networks , 2018, Theor. Comput. Sci..

[24]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[25]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[26]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[27]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[28]  Assaf Naor,et al.  Rigorous location of phase transitions in hard optimization problems , 2005, Nature.

[29]  van Vu,et al.  A Simple SVD Algorithm for Finding Hidden Partitions , 2014, Combinatorics, Probability and Computing.

[30]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[31]  V. Sós,et al.  Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing , 2007, math/0702004.

[32]  Alexandra Kolla,et al.  Multisection in the Stochastic Block Model using Semidefinite Programming , 2015, ArXiv.

[33]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Andrea Montanari,et al.  Conditional Random Fields, Planted Constraint Satisfaction and Entropy Concentration , 2013, APPROX-RANDOM.

[35]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Piyush Srivastava,et al.  Exact recovery in the Ising blockmodel , 2016, The Annals of Statistics.

[37]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[38]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[39]  Robert Dondero Princeton University , 2001 .

[40]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[42]  Sergio Verdú,et al.  Compressing data on graphs with clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[43]  D. Welsh,et al.  A Spectral Technique for Coloring Random 3-Colorable Graphs , 1994 .

[44]  Emmanuel Abbe,et al.  Proof of the Achievability Conjectures for the General Stochastic Block Model , 2018 .

[45]  Assaf Naor,et al.  The two possible values of the chromatic number of a random graph , 2004, STOC '04.

[46]  B. Bollobás The evolution of random graphs , 1984 .

[47]  Emmanuel Abbe,et al.  Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation , 2016, NIPS.

[48]  Peter J. Bickel,et al.  Community Detection in Networks using Graph Distance , 2014, ArXiv.

[49]  Andrea Montanari,et al.  Finding One Community in a Sparse Graph , 2015, Journal of Statistical Physics.

[50]  E. Szemerédi Regular Partitions of Graphs , 1975 .

[51]  Florent Krzakala,et al.  Spectral detection on sparse hypergraphs , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[52]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[53]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[54]  László Lovász,et al.  Limits of dense graph sequences , 2004, J. Comb. Theory B.

[55]  M. Mézard,et al.  Analytic and Algorithmic Solution of Random Satisfiability Problems , 2002, Science.

[56]  Cristopher Moore,et al.  Community detection in networks with unequal groups , 2015, Physical review. E.

[57]  Florent Krzakala,et al.  Spectral Clustering of graphs with the Bethe Hessian , 2014, NIPS.

[58]  Adel Javanmard,et al.  Performance of a community detection algorithm based on semidefinite programming , 2016, ArXiv.

[59]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[60]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[61]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[62]  Laurent Massoulié,et al.  Clustering and Inference From Pairwise Comparisons , 2015, SIGMETRICS.

[63]  László Lovász,et al.  Large Networks and Graph Limits , 2012, Colloquium Publications.

[64]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[65]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[66]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[67]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[68]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[69]  Audry Terras What are zeta functions of graphs and what are they good for ? , 2005 .

[70]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[71]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[72]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[73]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[74]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[75]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[76]  Elchanan Mossel,et al.  The Kesten-Stigum Reconstruction Bound Is Tight for Roughly Symmetric Binary Channels , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[77]  Aravindan Vijayaraghavan,et al.  Learning Communities in the Presence of Errors , 2015, COLT.

[78]  Varun Jog,et al.  Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence , 2015, ArXiv.

[79]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[80]  Alexander S. Wein,et al.  A semidefinite program for unbalanced multisection in the stochastic block model , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[81]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[82]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[83]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[84]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[85]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[86]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[87]  Florent Krzakala,et al.  Information-theoretic thresholds from the cavity method , 2016, STOC.

[88]  Amit Singer,et al.  Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery , 2014, 2014 IEEE International Symposium on Information Theory.

[89]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[90]  Florent Krzakala,et al.  Spectral detection in the censored block model , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[91]  Ankur Moitra,et al.  How robust are reconstruction thresholds for community detection? , 2015, STOC.

[92]  Emmanuel Abbe,et al.  Crossing the KS threshold in the stochastic block model with information theory , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[93]  Leonidas J. Guibas,et al.  Near-Optimal Joint Object Matching via Convex Relaxation , 2014, ICML.

[94]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[95]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[96]  Bruce Hajek,et al.  Information limits for recovering a hidden community , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[97]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[98]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[99]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[100]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[101]  Bruce E. Hajek,et al.  Recovering a Hidden Community Beyond the Spectral Limit in O(|E|log*|V|) Time , 2015, ArXiv.

[102]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[103]  Alexandre Proutière,et al.  Community Detection via Random and Adaptive Sampling , 2014, COLT.

[104]  Robert Krauthgamer,et al.  A polylogarithmic approximation of the minimum bisection , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[105]  J. Ruiz,et al.  On the purity of the limiting gibbs state for the Ising model on the Bethe lattice , 1995 .

[106]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[107]  V. Sós,et al.  GRAPH LIMITS AND EXCHANGEABLE RANDOM GRAPHS , 2008 .

[108]  Laurent Hébert-Dufresne,et al.  Finite size analysis of the detectability limit of the stochastic block model , 2016, Physical review. E.

[109]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[110]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[111]  Praneeth Netrapalli,et al.  Non-Reconstructability in the Stochastic Block Model , 2014, ArXiv.

[112]  References , 1971 .

[113]  Thomas Bonald,et al.  A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks , 2015, ALT.

[114]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[115]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[116]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[117]  Yoshiyuki Kabashima,et al.  Limitations in the spectral method for graph partitioning: detectability threshold and localization of eigenvectors , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[118]  Andrea J. Goldsmith,et al.  Information Recovery From Pairwise Measurements , 2015, IEEE Transactions on Information Theory.

[119]  Michele Leone,et al.  (Un)detectable cluster structure in sparse networks. , 2007, Physical review letters.

[120]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[121]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[122]  Yufei Zhao,et al.  An $L^p$ theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions , 2014, Transactions of the American Mathematical Society.

[123]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[124]  Joel Friedman,et al.  A proof of Alon's second eigenvalue conjecture and related problems , 2004, ArXiv.

[125]  Afonso S. Bandeira,et al.  Random Laplacian Matrices and Convex Relaxations , 2015, Found. Comput. Math..

[126]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[127]  Alexandre Proutière,et al.  Optimal Cluster Recovery in the Labeled Stochastic Block Model , 2015, NIPS.

[128]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[129]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[130]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[131]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[132]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[133]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[134]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[135]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[136]  K. Hashimoto Zeta functions of finite graphs and representations of p-adic groups , 1989 .

[137]  Laurent Massoulié,et al.  Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results , 2014, COLT.

[138]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[139]  Bin Yu,et al.  Impact of regularization on spectral clustering , 2013, 2014 Information Theory and Applications Workshop (ITA).

[140]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[141]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[142]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[143]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[144]  M. Mézard,et al.  Reconstruction on Trees and Spin Glass Transition , 2005, cond-mat/0512295.

[145]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[146]  Aristotelis Tsirigos,et al.  Detecting community structures in Hi-C genomic data , 2015, 2016 Annual Conference on Information Science and Systems (CISS).

[147]  H. Kesten,et al.  A Limit Theorem for Multidimensional Galton-Watson Processes , 1966 .

[148]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[149]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[150]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[151]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[152]  Laurent Massoulié,et al.  Community Detection in the Labelled Stochastic Block Model , 2012, ArXiv.

[153]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[154]  Cristopher Moore,et al.  The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness , 2017, Bull. EATCS.

[155]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[156]  S. Janson,et al.  Graph limits and exchangeable random graphs , 2007, 0712.2749.

[157]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[158]  Noga Alon,et al.  A Spectral Technique for Coloring Random 3-Colorable Graphs , 1997, SIAM J. Comput..

[159]  Cristopher Moore,et al.  Detectability thresholds and optimal algorithms for community structure in dynamic networks , 2015, ArXiv.

[160]  Elchanan Mossel,et al.  Robust reconstruction on trees is determined by the second eigenvalue , 2004, math/0406447.

[161]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[162]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[163]  Frank Thomson Leighton,et al.  Graph bisection algorithms with good average case behavior , 1984, Comb..

[164]  Andrea Montanari,et al.  Semidefinite programs on sparse random graphs and their application to community detection , 2015, STOC.

[165]  Thomas J. Richardson,et al.  An Introduction to the Analysis of Iterative Coding Systems , 2001 .

[166]  Elchanan Mossel,et al.  Information flow on trees , 2001, math/0107033.

[167]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[168]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 1999, Random Struct. Algorithms.