Stochastic block models with multiple continuous attributes

The stochastic block model (SBM) is a probabilistic model for community structure in networks. Typically, only the adjacency matrix is used to perform SBM parameter inference. In this paper, we consider circumstances in which nodes have an associated vector of continuous attributes that are also used to learn the node-to-community assignments and corresponding SBM parameters. Our model assumes that the attributes associated with the nodes in a network’s community can be described by a common multivariate Gaussian model. In this augmented, attributed SBM, the objective is to simultaneously learn the SBM connectivity probabilities with the multivariate Gaussian parameters describing each community. While there are recent examples in the literature that combine connectivity and attribute information to inform community detection, our model is the first augmented stochastic block model to handle multiple continuous attributes. This provides the flexibility in biological data to, for example, augment connectivity information with continuous measurements from multiple experimental modalities. Because the lack of labeled network data often makes community detection results difficult to validate, we highlight the usefulness of our model for two network prediction tasks: link prediction and collaborative filtering. As a result of fitting this attributed stochastic block model, one can predict the attribute vector or connectivity patterns for a new node in the event of the complementary source of information (connectivity or attributes, respectively). We also highlight two biological examples where the attributed stochastic block model provides satisfactory performance in the link prediction and collaborative filtering tasks.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Martha Larson,et al.  Collaborative Filtering beyond the User-Item Matrix , 2014, ACM Comput. Surv..

[3]  Zhengxin Chen,et al.  Incorporating Community Detection and Clustering Techniques into Collaborative Filtering Model , 2014, ITQM.

[4]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[5]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[6]  Jean-Paul Borg,et al.  Identification of new mechanisms of cellular response to chemotherapy by tracking changes in post-translational modifications by ubiquitin and ubiquitin-like proteins. , 2014, Journal of proteome research.

[7]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[8]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Mathias Géry,et al.  I-Louvain: An Attributed Graph Clustering Method , 2015, IDA.

[10]  Dane Taylor,et al.  Case studies in network community detection , 2017, The Oxford Handbook of Social Networks.

[11]  Caroline O. Buckee,et al.  A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes , 2013, PLoS Comput. Biol..

[12]  Marten Scheffer,et al.  Tipping elements in the human intestinal ecosystem , 2014, Nature Communications.

[13]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[16]  Barbora Micenková,et al.  Clustering attributed graphs: Models, measures and methods , 2015, Network Science.

[17]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[19]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[22]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[23]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[24]  R. Tibshirani,et al.  An immune clock of human pregnancy , 2017, Science Immunology.

[25]  John E. Hopcroft,et al.  Using community information to improve the precision of link prediction methods , 2012, WWW.

[26]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[28]  Florent Krzakala,et al.  Comparative study for inference of hidden classes in stochastic block models , 2012, ArXiv.

[29]  Derek Greene,et al.  Producing a unified graph representation from multiple social network views , 2013, WebSci.

[30]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Dane Taylor,et al.  Enhanced detectability of community structure in multilayer networks through layer aggregation , 2015, Physical review letters.

[33]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Peng Wang,et al.  Link prediction in social networks: the state-of-the-art , 2014, Science China Information Sciences.