Model-based clustering for populations of networks

Abstract Until recently obtaining data on populations of networks was typically rare. However, with the advancement of automatic monitoring devices and the growing social and scientific interest in networks, such data has become more widely available. From sociological experiments involving cognitive social structures to fMRI scans revealing large-scale brain networks of groups of patients, there is a growing awareness that we urgently need tools to analyse populations of networks and particularly to model the variation between networks due to covariates. We propose a model-based clustering method based on mixtures of generalized linear (mixed) models that can be employed to describe the joint distribution of a populations of networks in a parsimonious manner and to identify subpopulations of networks that share certain topological properties of interest (degree distribution, community structure, effect of covariates on the presence of an edge, etc.). Maximum likelihood estimation for the proposed model can be efficiently carried out with an implementation of the EM algorithm. We assess the performance of this method on simulated data and conclude with an example application on advice networks in a small business.

[1]  Vincent Miele,et al.  Statistical clustering of temporal networks through a dynamic stochastic block model , 2015, 1506.07464.

[2]  T. Snijders,et al.  p2: a random effects model with covariates for directed graphs , 2004 .

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Yuguo Chen,et al.  Latent Space Models for Dynamic Networks , 2015, 2005.08808.

[5]  Fentaw Abegaz,et al.  Sparse time series chain graphical models for reconstructing genetic networks. , 2013, Biostatistics.

[6]  Ernst Wit,et al.  NEAT: an efficient network enrichment analysis test , 2016, BMC Bioinformatics.

[7]  Purnamrita Sarkar,et al.  On clustering network-valued data , 2016, NIPS.

[8]  Daniele Durante,et al.  Bayesian modelling of networks in complex business intelligence problems , 2015, 1510.00646.

[9]  F. Leisch,et al.  Finite Mixtures of Generalized Linear Regression Models , 2008 .

[10]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[11]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[12]  Brian W. Junker,et al.  Hierarchical Mixed Membership Stochastic Blockmodels , 2014, Handbook of Mixed Membership Models and Their Applications.

[13]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[14]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[15]  Ernst Wit,et al.  A computationally fast alternative to cross-validation in penalized Gaussian graphical models , 2013, 1309.6216.

[16]  James O. Berger,et al.  The Effective Sample Size , 2014 .

[17]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[18]  David Krackhardt,et al.  Cognitive social structures , 1987 .

[19]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[20]  T. Snijders The statistical evaluation of social network dynamics , 2001 .

[21]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[22]  Patrick J. Wolfe,et al.  Null models for network data , 2012, ArXiv.

[23]  M. Signorelli,et al.  A penalized inference approach to stochastic block modelling of community structure in the Italian Parliament , 2016, ArXiv.

[24]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[25]  Abel Rodriguez,et al.  Stochastic blockmodels for exchangeable collections of networks , 2016, 1606.05277.

[26]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Tracy M. Sweet 22 Hierarchical Mixed Membership Stochastic Blockmodels for Multiple Networks and Experimental Interventions , 2014 .

[28]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[29]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[30]  Subhadeep Paul,et al.  Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel , 2015, 1506.02699.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Bruce A. Desmarais,et al.  Reciprocity and the structural determinants of the international sanctions network , 2014, Soc. Networks.

[33]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[34]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[35]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[36]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[37]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[38]  Daniele Durante,et al.  Nonparametric Bayes Modeling of Populations of Networks , 2014, 1406.7851.

[39]  A. Klovdahl,et al.  Social networks and the spread of infectious diseases: the AIDS example. , 1985, Social science & medicine.

[40]  Mirko Signorelli Variable selection for (realistic) stochastic blockmodels , 2017 .

[41]  Dane Taylor,et al.  Clustering Network Layers with the Strata Multilayer Stochastic Block Model , 2015, IEEE Transactions on Network Science and Engineering.

[42]  Nitish V. Thakor,et al.  Comparison method for community detection on brain networks from neuroimaging data , 2016, Applied Network Science.

[43]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[44]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[45]  Julie Fournet,et al.  Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers , 2014, Network Science.

[46]  Ernst Wit,et al.  A penalized inference approach to stochastic block modelling of community structure in the Italian Parliament , 2016, ArXiv.

[47]  Stanley Wasserman,et al.  Statistical Models for Social Networks , 2000 .