Inference via Message Passing on Partially Labeled Stochastic Block Models

We study the community detection and recovery problem in partially-labeled stochastic block models (SBM). We develop a fast linearized message-passing algorithm to reconstruct labels for SBM (with $n$ nodes, $k$ blocks, $p,q$ intra and inter block connectivity) when $\delta$ proportion of node labels are revealed. The signal-to-noise ratio ${\sf SNR}(n,k,p,q,\delta)$ is shown to characterize the fundamental limitations of inference via local algorithms. On the one hand, when ${\sf SNR}>1$, the linearized message-passing algorithm provides the statistical inference guarantee with mis-classification rate at most $\exp(-({\sf SNR}-1)/2)$, thus interpolating smoothly between strong and weak consistency. This exponential dependence improves upon the known error rate $({\sf SNR}-1)^{-1}$ in the literature on weak recovery. On the other hand, when ${\sf SNR}<1$ (for $k=2$) and ${\sf SNR}<1/4$ (for general growing $k$), we prove that local algorithms suffer an error rate at least $\frac{1}{2} - \sqrt{\delta \cdot {\sf SNR}}$, which is only slightly better than random guess for small $\delta$.

[1]  H. Kesten,et al.  A Limit Theorem for Multidimensional Galton-Watson Processes , 1966 .

[2]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[3]  Nathan Linial,et al.  Locality in Distributed Graph Algorithms , 1992, SIAM J. Comput..

[4]  Robin Pemantle,et al.  Robust Phase Transitions for Heisenberg and other Models on General Trees , 1999 .

[5]  Y. Peres,et al.  Broadcasting on trees and the Ising model , 2000 .

[6]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[7]  Elchanan Mossel,et al.  Information flow on trees , 2001, math/0107033.

[8]  Elchanan Mossel Reconstruction on Trees: Beating the Second Eigenvalue , 2001 .

[9]  Elchanan Mossel,et al.  Robust reconstruction on trees is determined by the second eigenvalue , 2004, math/0406447.

[10]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[11]  Dana Ron,et al.  On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms , 2007, Electron. Colloquium Comput. Complex..

[12]  Krzysztof Onak,et al.  Constant-Time Approximation Algorithms via Local Improvements , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[14]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[15]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Elchanan Mossel,et al.  Stochastic Block Models and Reconstruction , 2012 .

[17]  Amit Singer,et al.  Eigenvector Synchronization, Graph Rigidity and the Molecule Problem , 2011, Information and inference : a journal of the IMA.

[18]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[19]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[21]  Madhu Sudan,et al.  Limits of local algorithms over sparse random graphs , 2013, ITCS.

[22]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[23]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[24]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[25]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[26]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[27]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[28]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[29]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2016, IEEE Trans. Inf. Theory.

[30]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[31]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[32]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[33]  Y. Peres,et al.  Probability on Trees and Networks , 2017 .

[34]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[35]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.