Contextual Stochastic Block Model: Sharp Thresholds and Contiguity

We study community detection in the contextual stochastic block model arXiv:1807.09596 [cs.SI], arXiv:1607.02675 [stat.ME]. In arXiv:1807.09596 [cs.SI], the second author studied this problem in the setting of sparse graphs with high-dimensional node-covariates. Using the non-rigorous cavity method from statistical physics, they conjectured the sharp limits for community detection in this setting. Further, the information theoretic threshold was verified, assuming that the average degree of the observed graph is large. It is expected that the conjecture holds as soon as the average degree exceeds one, so that the graph has a giant component. We establish this conjecture, and characterize the sharp threshold for detection and weak recovery.

[1]  Thomas Seidl,et al.  Spectral Subspace Clustering for Graphs with Feature Vectors , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Peter D. Hoff Random Effects Models for Network Data , 2003 .

[3]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[4]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[5]  David Steurer,et al.  Efficient Bayesian Estimation from Few Samples: Community Detection and Related Problems , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Ee-Peng Lim,et al.  Of Information Systems School of Information Systems 11-2014 On Joint Modeling of Topical Communities and Personal Interest in Microblogs , 2017 .

[7]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[8]  Andrea Montanari,et al.  Contextual Stochastic Block Models , 2018, NeurIPS.

[9]  Kristina Lerman,et al.  Partitioning Networks with Node Attributes by Compressing Information Flow , 2014, ACM Trans. Knowl. Discov. Data.

[10]  Emmanuel Abbe,et al.  Community Detection and Stochastic Block Models , 2017, Found. Trends Commun. Inf. Theory.

[11]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[12]  Zongming Ma,et al.  Asymptotic normality and analysis of variance of log-likelihood ratios in spiked random matrix models , 2018, ArXiv.

[13]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[14]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[15]  O. Zeitouni,et al.  A CLT for a band matrix model , 2004, math/0412040.

[16]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[17]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[18]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[19]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[20]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[21]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[22]  Michael I. Jordan,et al.  Detection limits in the high-dimensional spiked rectangular model , 2018, COLT.

[23]  Svante Janson,et al.  Random Regular Graphs: Asymptotic Distributions and Contiguity , 1995, Combinatorics, Probability and Computing.

[24]  G. C. Wick The Evaluation of the Collision Matrix , 1950 .

[25]  Nicholas C. Wormald,et al.  Almost All Cubic Graphs Are Hamiltonian , 1992, Random Struct. Algorithms.

[26]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[27]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[28]  Marcelo J. Moreira,et al.  Asymptotic power of sphericity tests for high-dimensional data , 2013, 1306.4867.

[29]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[30]  Ankur Moitra,et al.  Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models , 2018, The Annals of Statistics.

[31]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[32]  Debapratim Banerjee Contiguity and non-reconstruction results for planted partition models: the dense case , 2016, 1609.02854.

[33]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[34]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[35]  I. Johnstone,et al.  Testing in high-dimensional spiked models , 2015, The Annals of Statistics.

[36]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[37]  Hong Cheng,et al.  Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities , 2011, TKDD.

[38]  Michael I. Jordan,et al.  Fundamental limits of detection in the spiked Wigner model , 2018, 1806.09588.

[39]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[40]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[41]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[42]  Emmanuel Viennet,et al.  Community Detection based on Structural and Attribute Similarities , 2012, ICDS 2012.

[43]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[44]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[45]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[46]  S. Kak Information, physics, and computation , 1996 .

[47]  Barbora Micenková,et al.  Clustering attributed graphs: Models, measures and methods , 2015, Network Science.

[48]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[49]  Nicholas C. Wormald,et al.  Almost All Regular Graphs Are Hamiltonian , 1994, Random Struct. Algorithms.

[50]  Alexei Onatski,et al.  Signal detection in high dimension: The multispiked case , 2012, 1210.5663.

[51]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[52]  Ernest Valveny,et al.  Graph embedding in vector spaces by node attribute statistics , 2012, Pattern Recognit..

[53]  Christophe Ambroise,et al.  Clustering based on random graph model embedding vertex features , 2009, Pattern Recognit. Lett..

[54]  Jure Leskovec,et al.  Latent Multi-group Membership Graph Model , 2012, ICML.