Density Evolution in the Degree-correlated Stochastic Block Model

There is a recent surge of interest in identifying the sharp recovery thresholds for cluster recovery under the stochastic block model. In this paper, we address the more refined question of how many vertices that will be misclassified on average. We consider the binary form of the stochastic block model, where $n$ vertices are partitioned into two clusters with edge probability $a/n$ within the first cluster, $c/n$ within the second cluster, and $b/n$ across clusters. Suppose that as $n \to \infty$, $a= b+ \mu \sqrt{ b} $, $c=b+ \nu \sqrt{ b} $ for two fixed constants $\mu, \nu$, and $b \to \infty$ with $b=n^{o(1)}$. When the cluster sizes are balanced and $\mu \neq \nu$, we show that the minimum fraction of misclassified vertices on average is given by $Q(\sqrt{v^*})$, where $Q(x)$ is the Q-function for standard normal, $v^*$ is the unique fixed point of $v= \frac{(\mu-\nu)^2}{16} + \frac{ (\mu+\nu)^2 }{16} \mathbb{E}[ \tanh(v+ \sqrt{v} Z)],$ and $Z$ is standard normal. Moreover, the minimum misclassified fraction on average is attained by a local algorithm, namely belief propagation, in time linear in the number of edges. Our proof techniques are based on connecting the cluster recovery problem to tree reconstruction problems, and analyzing the density evolution of belief propagation on trees with Gaussian approximations.

[1]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[2]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[3]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[4]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[5]  Amin Coja-Oghlan,et al.  A spectral heuristic for bisecting random graphs , 2005, SODA '05.

[6]  Alexandre Proutière,et al.  Community Detection via Random and Adaptive Sampling , 2014, COLT.

[7]  B. Szegedy,et al.  Limits of locally–globally convergent graph sequences , 2012, Geometric and Functional Analysis.

[8]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[9]  Alexandra Kolla,et al.  Multisection in the Stochastic Block Model using Semidefinite Programming , 2015, ArXiv.

[10]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[11]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[12]  Madhu Sudan,et al.  Limits of local algorithms over sparse random graphs , 2013, ITCS.

[13]  Andrea Montanari,et al.  Analysis of Belief Propagation for Non-Linear Problems: The Example of CDMA (or: How to Prove Tanaka's Formula) , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[14]  I. Shevtsova,et al.  An improvement of the Berry–Esseen inequality with applications to Poisson and mixed Poisson random sums , 2009, 0912.2795.

[15]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[16]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[17]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[18]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[19]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[20]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[21]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[22]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[24]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[25]  Bruce E. Hajek,et al.  Recovering a Hidden Community Beyond the Spectral Limit in O(|E|log*|V|) Time , 2015, ArXiv.

[26]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[27]  Andrea Montanari,et al.  Tight bounds for LDPC and LDGM codes under MAP decoding , 2004, IEEE Transactions on Information Theory.

[28]  Afonso S. Bandeira,et al.  Random Laplacian Matrices and Convex Relaxations , 2015, Found. Comput. Math..

[29]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[30]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[31]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[32]  Alexander S. Wein,et al.  A semidefinite program for unbalanced multisection in the stochastic block model , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[33]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[34]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[35]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[36]  Fedor Nazarov,et al.  Perfect matchings as IID factors on non-amenable groups , 2009, Eur. J. Comb..

[37]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[38]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[39]  S. Kak Information, physics, and computation , 1996 .

[40]  Cristopher Moore,et al.  Community detection in networks with unequal groups , 2015, Physical review. E.

[41]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[42]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[43]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[44]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[45]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[46]  B. Szegedy,et al.  Limits of locally–globally convergent graph sequences , 2014 .