Local Algorithms for Block Models with Side Information

There has been a recent interest in understanding the power of local algorithms for optimization and inference problems on sparse graphs. Gamarnik and Sudan (2014) showed that local algorithms are weaker than global algorithms for finding large independent sets in sparse random regular graphs thus refuting a conjecture by Hatami, Lovász, and Szegedy (2012). Montanari (2015) showed that local algorithms are suboptimal for finding a community with high connectivityin the sparse Erdös-Rényi random graphs. For the symmetric planted partition problem (also named community detection for the block models) on sparse graphs, a simple observation is that local algorithms cannot have non-trivial performance. In this work we consider the effect of side information on local algorithms for community detection under the binary symmetric stochastic block model. In the block model with side information each of the n vertices is labeled + or - independently and uniformly at random; each pair of vertices is connected independently with probability a/n if both of them have the same label or b/n otherwise. The goal is to estimate the underlying vertex labeling given 1) the graph structure and 2) side information in the form of a vertex labeling positively correlated with the true one. Assuming that the ratio between in and out degree a/b is θ(1) and the average degree (a+b) / 2 = n{o(1), we show that a local algorithm, namely, belief propagation run on the local neighborhoods, maximizes the expected fraction of vertices labeled correctly in the following three regimes: |a--b|<2 and all 0 < α < 1/2 (a--b)2 > C (a+b) for some constant C and all 0 < α < 1/2 For all a,b if the probability that each given vertex label is incorrect is at most α* for some constant α* ∈ (0,1/2). Thus, in contrast to the case of independent sets or a single community in random graphs and to the case of symmetric block models without side information, we show that local algorithms achieve optimal performance in the above three regimes for the block model with side information. To complement our results, in the large degree limit α → ∞, we give a formula of the expected fraction of vertices labeled correctly by the local belief propagation, in terms of a fixed point of a recursion derived from the density evolution analysis with Gaussian approximations.

[1]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[2]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[3]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[4]  S. Kak Information, physics, and computation , 1996 .

[5]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[6]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[7]  Bruce E. Hajek,et al.  Recovering a Hidden Community Beyond the Spectral Limit in O(|E|log*|V|) Time , 2015, ArXiv.

[8]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[9]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[10]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[11]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[12]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[13]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[14]  Andrea Montanari,et al.  Tight bounds for LDPC and LDGM codes under MAP decoding , 2004, IEEE Transactions on Information Theory.

[15]  XuJiaming,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming , 2016 .

[16]  Amin Coja-Oghlan,et al.  Graph Partitioning via Adaptive Spectral Techniques , 2009, Combinatorics, Probability and Computing.

[17]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[18]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[19]  Afonso S. Bandeira,et al.  Random Laplacian Matrices and Convex Relaxations , 2015, Found. Comput. Math..

[20]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[21]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[23]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[24]  Madhu Sudan,et al.  Limits of local algorithms over sparse random graphs , 2013, ITCS.

[25]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[26]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[27]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[28]  Fedor Nazarov,et al.  Perfect matchings as IID factors on non-amenable groups , 2009, Eur. J. Comb..

[29]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[30]  Armen E. Allahverdyan,et al.  Phase Transitions in Community Detection: A Solvable Toy Model , 2013, ArXiv.

[31]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[32]  Alexandre Proutière,et al.  Community Detection via Random and Adaptive Sampling , 2014, COLT.

[33]  B. Szegedy,et al.  Limits of locally–globally convergent graph sequences , 2012, Geometric and Functional Analysis.

[34]  Bálint Virág,et al.  Local algorithms for independent sets are half-optimal , 2014, ArXiv.

[35]  Amin Coja-Oghlan,et al.  A spectral heuristic for bisecting random graphs , 2005, SODA '05.

[36]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[38]  Elchanan Mossel,et al.  Density Evolution in the Degree-correlated Stochastic Block Model , 2015, COLT.

[39]  Elchanan Mossel,et al.  Belief propagation, robust reconstruction and optimal recovery of block models , 2013, COLT.

[40]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[41]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[42]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[43]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[44]  Alexander S. Wein,et al.  A semidefinite program for unbalanced multisection in the stochastic block model , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[45]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[46]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[47]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[48]  I. Shevtsova,et al.  An improvement of the Berry–Esseen inequality with applications to Poisson and mixed Poisson random sums , 2009, 0912.2795.

[49]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[50]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[51]  Armen E. Allahverdyan,et al.  Community detection with and without prior information , 2009, ArXiv.

[52]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[53]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[54]  R. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001 .