Bayesian community detection for networks with covariates

The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or"communities,"has arguably received the most attention in the scientific community. In many real-world applications, the network data often come with additional information in the form of node or edge covariates that should ideally be leveraged for inference. In this paper, we add to a limited literature on community detection for networks with covariates by proposing a Bayesian stochastic block model with a covariate-dependent random partition prior. Under our prior, the covariates are explicitly expressed in specifying the prior distribution on the cluster membership. Our model has the flexibility of modeling uncertainties of all the parameter estimates including the community membership. Importantly, and unlike the majority of existing methods, our model has the ability to learn the number of the communities via posterior inference without having to assume it to be known. Our model can be applied to community detection in both dense and sparse networks, with both categorical and continuous covariates, and our MCMC algorithm is very efficient with good mixing properties. We demonstrate the superior performance of our model over existing models in a comprehensive simulation study and an application to two real datasets.

[1]  Wanjie Wang,et al.  Covariate-Assisted Community Detection on Sparse Networks , 2022, 2208.00257.

[2]  Haolei Weng,et al.  Community detection with nodal information: Likelihood and its variational approximation , 2021, Stat.

[3]  Daniele Durante,et al.  Bayesian Testing for Exogenous Partition Structures in Stochastic Block Models , 2020, Sankhya A.

[4]  Jing Lei,et al.  Consistent community detection in multi-layer network data , 2019, Biometrika.

[5]  Miaoyan Wang,et al.  Multiway clustering via tensor block models , 2019, NeurIPS.

[6]  Lizhen Lin,et al.  Hierarchical Stochastic Block Model for Community Detection in Multiplex Networks , 2019, Bayesian Analysis.

[7]  Lizhen Lin,et al.  Averages of unlabeled networks: Geometric characterization and asymptotic behavior , 2017, The Annals of Statistics.

[8]  Lan Du,et al.  Leveraging Node Attributes for Incomplete Relational Data , 2017, ICML.

[9]  Michel X. Goemans,et al.  Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[10]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[11]  Tracy M. Sweet,et al.  Incorporating Covariates Into Stochastic Blockmodels , 2015 .

[12]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[13]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[14]  Mingyuan Zhou,et al.  Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction , 2015, AISTATS.

[15]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[16]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[17]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[18]  Morten Mørup,et al.  Bayesian Community Detection , 2012, Neural Computation.

[19]  Erik B. Sudderth,et al.  The Nonparametric Metadata Dependent Relational Model , 2012, ICML.

[20]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  David B. Dunson,et al.  Logistic Stick-Breaking Process , 2011, J. Mach. Learn. Res..

[22]  Peter Müller,et al.  A Product Partition Model With Regression on Covariates , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[23]  P. Müller,et al.  Random Partition Models with Regression on Covariates. , 2010, Journal of statistical planning and inference.

[24]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[26]  D. Dunson,et al.  BAYESIAN GENERALIZED PRODUCT PARTITION MODEL , 2010 .

[27]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[28]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[29]  G. Kitagawa,et al.  Information Criteria and Statistical Modeling , 2007 .

[30]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[31]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[32]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Andrzej Rucinski,et al.  Random graphs , 2006, SODA.

[34]  Christian Tallberg A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES , 2004 .

[35]  Samuel Schmidt,et al.  The political network in Mexico , 1996 .

[36]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[37]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[38]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[39]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[40]  Owen L. Petchey,et al.  Provided for Non-commercial Research and Educational Use Only. Not for Reproduction, Distribution or Commercial Use. the Role of Body Size in Complex Food Webs: a Cold Case Author's Personal Copy , 2022 .

[41]  William H. Press,et al.  Numerical recipes in C , 2002 .

[42]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .