Variational Bayesian inference and complexity control for stochastic block models

It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to the connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work, and numerous inference strategies such as variational expectation maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model-based criterion, Integrated Complete-data Likelihood (ICL), has been derived for SBM in the literature. It relies on an asymptotic approximation of the integrated complete-data likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call Integrated Likelihood Variational Bayes (ILvb), based on a non-asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm.

[1]  Cristina G. Fernandes,et al.  Motif Search in Graphs: Application to Metabolic Networks , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[3]  Christophe Ambroise,et al.  Fast online graph clustering via Erdös-Rényi mixture , 2008, Pattern Recognit..

[4]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[7]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[8]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[9]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[10]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[11]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[12]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[13]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[14]  G. Celeux,et al.  Exact and Monte Carlo calculations of integrated likelihoods for the latent class model , 2010 .

[15]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[16]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[17]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[18]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[19]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[20]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[21]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[22]  F. Harary,et al.  Cluster Inference by using Transitivity Indices in Empirical Graphs , 1982 .

[23]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[24]  S. Boorman,et al.  Social Structure from Multiple Networks. II. Role Structures , 1976, American Journal of Sociology.

[25]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[27]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[28]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[29]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[30]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[31]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[32]  J. A. Rodríguez-Velázquez,et al.  Spectral measures of bipartivity in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  St'ephane Robin,et al.  Uncovering latent structure in valued graphs: A variational approach , 2010, 1011.1813.

[34]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[35]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[36]  S. Fienberg,et al.  Categorical Data Analysis of Single Sociometric Relations , 1981 .

[37]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..