Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence

We derive sharp thresholds for exact recovery of communities in a weighted stochastic block model, where observations are collected in the form of a weighted adjacency matrix, and the weight of each edge is generated independently from a distribution determined by the community membership of its endpoints. Our main result, characterizing the precise boundary between success and failure of maximum likelihood estimation when edge weights are drawn from discrete distributions, involves the Renyi divergence of order $\frac{1}{2}$ between the distributions of within-community and between-community edges. When the Renyi divergence is above a certain threshold, meaning the edge distributions are sufficiently separated, maximum likelihood succeeds with probability tending to 1; when the Renyi divergence is below the threshold, maximum likelihood fails with probability bounded away from 0. In the language of graphical channels, the Renyi divergence pinpoints the information-theoretic capacity of discrete graphical channels with binary inputs. Our results generalize previously established thresholds derived specifically for unweighted block models, and support an important natural intuition relating the intrinsic hardness of community estimation to the problem of edge classification. Along the way, we establish a general relationship between the Renyi divergence and the probability of success of the maximum likelihood estimator for arbitrary edge weight distributions. Finally, we discuss consequences of our bounds for the related problems of censored block models and submatrix localization, which may be seen as special cases of the framework developed in our paper.

[1]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[3]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[4]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[5]  Bruce E. Hajek,et al.  Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions , 2015, IEEE Transactions on Information Theory.

[6]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[10]  E. Todeva Networks , 2007 .

[11]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[13]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[14]  Elchanan Mossel,et al.  Consistency thresholds for the planted bisection model , 2016 .

[15]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[16]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[17]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[18]  A. W. Knapp Basic Real Analysis , 2005 .

[19]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[20]  Andrea Montanari,et al.  Conditional Random Fields, Planted Satisfaction, and Entropy Concentration , 2013, ArXiv.

[21]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[22]  Laurent Massoulié,et al.  Community Detection in the Labelled Stochastic Block Model , 2012, ArXiv.

[23]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[24]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[25]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[26]  Bruce E. Hajek,et al.  Achieving exact cluster recovery threshold via semidefinite programming , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  Andrea J. Goldsmith,et al.  Information recovery from pairwise measurements: A shannon-theoretic approach , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[29]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[30]  S. Wasserman,et al.  Stochastic a posteriori blockmodels: Construction and assessment , 1987 .

[31]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[32]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[33]  D. Sade,et al.  Sociometrics of Macaca mulatta. I. Linkages and cliques in grooming matrices. , 1972, Folia primatologica; international journal of primatology.

[34]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[35]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[36]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[37]  Andrea J. Goldsmith,et al.  Information Recovery From Pairwise Measurements , 2015, IEEE Transactions on Information Theory.

[38]  Laurent Massoulié,et al.  Reconstruction in the labeled stochastic block model , 2013, 2013 IEEE Information Theory Workshop (ITW).

[39]  Andrea Montanari,et al.  Conditional Random Fields, Planted Constraint Satisfaction and Entropy Concentration , 2013, APPROX-RANDOM.

[40]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[41]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[42]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.