Asymptotic mutual information for the balanced binary stochastic block model

We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph G from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit ‘single-letter’ characterization of the pervertex mutual information between the vertex labels and the graph, when the mean vertex degree diverges. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and –in particular– reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  Frank Thomson Leighton,et al.  Graph bisection algorithms with good average case behavior , 1984, Comb..

[4]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[5]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[6]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[7]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[8]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Russell Impagliazzo,et al.  Hill-climbing finds random planted bisections , 2001, SODA '01.

[10]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[11]  F. Guerra Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model , 2002, cond-mat/0205123.

[12]  Andrea Montanari,et al.  Life Above Threshold: From List Decoding to Area Theorem and MSE , 2004, ArXiv.

[13]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[14]  Andrea Montanari,et al.  Analysis of Belief Propagation for Non-Linear Problems: The Example of CDMA (or: How to Prove Tanaka's Formula) , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[15]  S. Chatterjee A generalization of the Lindeberg principle , 2005, math/0508519.

[16]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[17]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[18]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[19]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[20]  A. Guionnet,et al.  An Introduction to Random Matrices , 2009 .

[21]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[22]  Satish Babu Korada,et al.  Exact Solution of the Gauge Symmetric p-Spin Glass Model on a Complete Graph , 2009 .

[23]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[24]  Andrea Montanari,et al.  The Generalized Area Theorem and Some of its Consequences , 2005, IEEE Transactions on Information Theory.

[25]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[26]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, ISIT.

[27]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[28]  Andrea Montanari,et al.  Applications of the Lindeberg Principle in Communications and Statistical Learning , 2010, IEEE Transactions on Information Theory.

[29]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Andrea Montanari,et al.  Universality in Polytope Phase Transitions and Message Passing Algorithms , 2012, ArXiv.

[31]  Andrea Montanari,et al.  Graphical Models Concepts in Compressed Sensing , 2010, Compressed Sensing.

[32]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[33]  Sundeep Rangan,et al.  Iterative estimation of constrained rank-one matrices in noise , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[34]  E. Bolthausen An Iterative Construction of Solutions of the TAP Equations for the Sherrington–Kirkpatrick Model , 2012, 1201.2891.

[35]  Laurent Massoulié,et al.  Community Detection in the Labelled Stochastic Block Model , 2012, ArXiv.

[36]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[37]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[38]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[39]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[40]  Laurent Massoulié,et al.  Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results , 2014, COLT.

[41]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[42]  Amit Singer,et al.  Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery , 2014, 2014 IEEE International Symposium on Information Theory.

[43]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[44]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[45]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[46]  Tim Roughgarden,et al.  Tight Error Bounds for Structured Prediction , 2014, ArXiv.

[47]  Leonidas J. Guibas,et al.  Near-Optimal Joint Object Matching via Convex Relaxation , 2014, ICML.

[48]  Elchanan Mossel,et al.  Consistency Thresholds for Binary Symmetric Block Models , 2014, ArXiv.

[49]  Andrea Montanari,et al.  Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques , 2014, ArXiv.

[50]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[51]  Andrea Montanari,et al.  Finding Hidden Cliques of Size $$\sqrt{N/e}$$N/e in Nearly Linear Time , 2013, Found. Comput. Math..

[52]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[53]  Anup Rao,et al.  Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery , 2015, COLT.

[54]  Andrea Montanari,et al.  Conditional Random Fields, Planted Constraint Satisfaction, and Entropy Concentration , 2015, Theory Comput..

[55]  Florent Krzakala,et al.  MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[56]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[57]  Andrea Montanari,et al.  Finding One Community in a Sparse Graph , 2015, Journal of Statistical Physics.

[58]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[59]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[60]  Florent Krzakala,et al.  Spectral detection in the censored block model , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[61]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[62]  Tengyuan Liang,et al.  Inference via Message Passing on Partially Labeled Stochastic Block Models , 2016, 1603.06923.

[63]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[64]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[65]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[66]  Andrea J. Goldsmith,et al.  Information Recovery From Pairwise Measurements , 2015, IEEE Transactions on Information Theory.

[67]  van Vu,et al.  A Simple SVD Algorithm for Finding Hidden Partitions , 2014, Combinatorics, Probability and Computing.

[68]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[69]  Afonso S. Bandeira,et al.  Random Laplacian Matrices and Convex Relaxations , 2015, Found. Comput. Math..