Contextual Stochastic Block Models

We provide the first information theoretical tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. Our work bridges recent theoretical breakthroughs in detection of latent community structure without nodes covariates and a large body of empirical work using diverse heuristics for combining node covariates with graphs for inference. The tightness of our analysis implies in particular, the information theoretic necessity of combining the different sources of information. Our analysis holds for networks of large degrees as well as for a Gaussian version of the model.

[1]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[2]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[3]  Andrea Montanari,et al.  Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques , 2014, ArXiv.

[4]  Thomas Seidl,et al.  Spectral Subspace Clustering for Graphs with Feature Vectors , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[6]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[7]  M. Talagrand Mean Field Models for Spin Glasses , 2011 .

[8]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[9]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[10]  Christophe Ambroise,et al.  Clustering based on random graph model embedding vertex features , 2009, Pattern Recognit. Lett..

[11]  Jure Leskovec,et al.  Latent Multi-group Membership Graph Model , 2012, ICML.

[12]  Marcelo J. Moreira,et al.  Asymptotic power of sphericity tests for high-dimensional data , 2013, 1306.4867.

[13]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[14]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[15]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[16]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[17]  Emmanuel Viennet,et al.  Community Detection based on Structural and Attribute Similarities , 2012, ICDS 2012.

[18]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Ee-Peng Lim,et al.  Of Information Systems School of Information Systems 11-2014 On Joint Modeling of Topical Communities and Personal Interest in Microblogs , 2017 .

[21]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[22]  David Mumford,et al.  Communications on Pure and Applied Mathematics , 1989 .

[23]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[24]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[25]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[26]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[27]  Peter D. Hoff Random Effects Models for Network Data , 2003 .

[28]  Amit Singer,et al.  Decoding Binary Node Labels from Censored Edge Measurements: Phase Transition and Efficient Recovery , 2014, IEEE Transactions on Network Science and Engineering.

[29]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[30]  Ernest Valveny,et al.  Graph embedding in vector spaces by node attribute statistics , 2012, Pattern Recognit..

[31]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[32]  Andre Dekker,et al.  Radiomics: the process and the challenges. , 2012, Magnetic resonance imaging.

[33]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[34]  S. Péché The largest eigenvalue of small rank perturbations of Hermitian random matrices , 2004, math/0411487.

[35]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[36]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[37]  Jun Yin,et al.  The Isotropic Semicircle Law and Deformation of Wigner Matrices , 2011, 1110.6449.

[38]  Mihai Cucuringu,et al.  Synchronization over Z2 and community detection in signed multiplex networks with constraints , 2015, J. Complex Networks.

[39]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[40]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[41]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[42]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[43]  ZanghiHugo,et al.  Clustering based on random graph model embedding vertex features , 2010 .

[44]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[45]  Hong Cheng,et al.  Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities , 2011, TKDD.

[46]  Andrea Montanari,et al.  Graphical Models Concepts in Compressed Sensing , 2010, Compressed Sensing.

[47]  Andrea Montanari,et al.  On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank One Perturbations of Gaussian Tensors , 2014, IEEE Transactions on Information Theory.

[48]  Barbora Micenková,et al.  Clustering attributed graphs: Models, measures and methods , 2015, Network Science.

[49]  Eric Eaton,et al.  A Spin-Glass Model for Semi-Supervised Community Detection , 2012, AAAI.

[50]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[51]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[52]  A. Montanari,et al.  Asymptotic mutual information for the balanced binary stochastic block model , 2016 .

[53]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[54]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[55]  Kristina Lerman,et al.  Partitioning Networks with Node Attributes by Compressing Information Flow , 2014, ACM Trans. Knowl. Discov. Data.