Variational Inference for Stochastic Block Models From Sampled Data

Abstract This article deals with nonobserved dyads during the sampling of a network and consecutive issues in the inference of the stochastic block model (SBM). We review sampling designs and recover missing at random (MAR) and not missing at random (NMAR) conditions for the SBM. We introduce variants of the variational EM algorithm for inferring the SBM under various sampling designs (MAR and NMAR) all available as an R package. Model selection criteria based on integrated classification likelihood are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We explore two real-world networks from ethnology (seed circulation network) and biology (protein–protein interaction network), where the interpretations considerably depend on the sampling designs considered. Supplementary materials for this article are available online.

[1]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[2]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[3]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[4]  Stanley Wasserman,et al.  Statistical Models for Social Networks , 2000 .

[5]  Geert Molenberghs,et al.  EVERY MISSING NOT AT RANDOM MODEL HAS GOT A MISSING AT RANDOM COUNTERPART WITH EQUAL FIT , 2008 .

[6]  O. Geoffrey Okogbaa,et al.  A review of: “Adaptive Sampling” S. Thompson and G. Seber Wiley, 1996 , 1997 .

[7]  St'ephane Robin,et al.  Uncovering latent structure in valued graphs: A variational approach , 2010, 1011.1813.

[8]  M. Kenward,et al.  Every missingness not at random model has a missingness at random counterpart with equal fit , 2008 .

[9]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[11]  P. Latouche,et al.  Goodness of fit of logistic models for random graphs , 2015 .

[12]  Catherine Matias,et al.  MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE MODELS: A SELECTIVE REVIEW , 2014 .

[13]  E. Lazega,et al.  Stochastic block models for multiplex networks: an application to a multilevel network of researchers , 2017 .

[14]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[15]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Vincent Miele,et al.  Statistical clustering of temporal networks through a dynamic stochastic block model , 2015, 1506.07464.

[18]  P. Newbold,et al.  Estimation and Prediction , 1985 .

[19]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[20]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[21]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Pierre Barbillon,et al.  Stochastic Block Models for Multiplex networks: an application to networks of researchers , 2015, 1501.06444.

[23]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[24]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[25]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[26]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[27]  Carey E. Priebe,et al.  Statistical Inference on Errorfully Observed Graphs , 2012, 1211.3601.

[28]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[29]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[30]  Babak Hassibi,et al.  Graph Clustering With Missing Data: Convex Algorithms and Analysis , 2014, NIPS.

[31]  Eric D. Kolaczyk,et al.  On the Propagation of Low-Rate Measurement Error to Subgraph Counts in Large Networks , 2014, J. Mach. Learn. Res..

[32]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[33]  Steve Thompson,et al.  Estimating the Size and Distribution of Networked Populations with Snowball Sampling , 2014, Journal of Survey Statistics and Methodology.

[34]  Christophe Ambroise,et al.  Variational Bayesian inference and complexity control for stochastic block models , 2009, 0912.2873.

[35]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[36]  F. Harary,et al.  Cluster Inference by using Transitivity Indices in Empirical Graphs , 1982 .

[37]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[38]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[39]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[41]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[42]  Michael I. Jordan Graphical Models , 2003 .

[43]  Christian Leclerc,et al.  Influence of Ethnolinguistic Diversity on the Sorghum Genetic Patterns in Subsistence Farming Systems in Eastern Kenya , 2014, PloS one.

[44]  Mathieu Thomas,et al.  Seed exchange networks, ethnicity, and sorghum diversity , 2015, Proceedings of the National Academy of Sciences.

[45]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.