Random graphs with node and block effects: models, goodness-of-fit tests, and applications to biological networks

Many popular models from the networks literature can be viewed through a common lens. We describe it here and call the class of models log-linear ERGMs. It includes degree-based models, stochastic blockmodels, and combinations of these. Given the interest in combined node and block effects in network formation mechanisms, we introduce a general directed relative of the degree-corrected stochastic blockmodel: an exponential family model we call p1-SBM. It is a generalization of several well-known variants of the blockmodel. We study the problem of testing model fit for the log-linear ERGM class. The model fitting approach we take, through the use of quick estimation algorithms borrowed from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics, results in an exact test whose p-value can be approximated efficiently in networks of moderate sizes. We showcase the performance of the method on two data sets from biology: the connectome of C. elegans and the interactome of Arabidopsis thaliana. These two networks, a neuronal network and a protein-protein interaction network, have been popular examples in the network science literature, but a model-based approach to studying them has been missing thus far.

[1]  David Strauss On a general class of models for interaction , 1986 .

[2]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[3]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[4]  A. Rinaldo,et al.  CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS. , 2011, Annals of statistics.

[5]  S. Fienberg,et al.  Categorical Data Analysis of Single Sociometric Relations , 1981 .

[6]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[7]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[8]  Stephen E. Fienberg,et al.  Algebraic Statistics for a Directed Random Graph Model with Reciprocation , 2009, 0909.0073.

[9]  Sonja Petrović,et al.  Toric algebra of hypergraphs , 2012, 1206.1904.

[10]  Yuguo Chen,et al.  Sampling for Conditional Inference on Network Data , 2013 .

[11]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[12]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[13]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[15]  Emma K. Towlson,et al.  The Rich Club of the C. elegans Neuronal Connectome , 2013, The Journal of Neuroscience.

[16]  X ZhengAlice,et al.  A Survey of Statistical Network Models , 2010 .

[17]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[18]  W. Yamamoto,et al.  AY's Neuroanatomy of C. elegans for Computation , 1992 .

[19]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[20]  H. Wynn,et al.  Algebraic Methods in Statistics and Probability II , 2001 .

[21]  Vishesh Karwa,et al.  Statistical models for cores decomposition of an undirected random graph , 2014, ArXiv.

[22]  Camille Roth,et al.  Generating constrained random graphs using multiple edge switches , 2010, JEAL.

[23]  Hisayuki Hara,et al.  Graver basis for an undirected graph and its application to testing the beta model of random graphs , 2011, 1102.2583.

[24]  Kevin E. Bassler,et al.  Constructing and sampling directed graphs with given degree sequences , 2011, ArXiv.

[25]  Stephen E. Fienberg,et al.  Statistical Inference in a Directed Network Model With Covariates , 2016, Journal of the American Statistical Association.

[26]  A. Rinaldo,et al.  Algebraic Statistics and Contingency Table Problems: Log-Linear Models, Likelihood Estimation, and Disclosure Limitation , 2009 .

[27]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[28]  J. Besag,et al.  Generalized Monte Carlo significance tests , 1989 .

[29]  Zongming Ma,et al.  Optimal hypothesis testing for stochastic block models with growing degrees , 2017, ArXiv.

[30]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[31]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[32]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[33]  Pavel N Krivitsky,et al.  An Approximation Method for Improving Dynamic Network Model Fitting , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[34]  Lav R. Varshney,et al.  Structural Properties of the Caenorhabditis elegans Neuronal Network , 2009, PLoS Comput. Biol..

[35]  Alessandro Rinaldo,et al.  Hierarchical models for independence structures of networks , 2016, Statistica Neerlandica.

[36]  Stephen E. Fienberg,et al.  Maximum lilkelihood estimation in the $\beta$-model , 2011, 1105.6145.

[37]  Persi Diaconis,et al.  A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees , 2011, Internet Math..

[38]  A. Rao,et al.  A Markov chain Monte carol method for generating random (0, 1)-matrices with given marginals , 1996 .

[39]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[40]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[41]  Prasad Tetali,et al.  Simple Markov-chain algorithms for generating bipartite graphs and tournaments , 1997, SODA '97.

[42]  H. Ryser Combinatorial Properties of Matrices of Zeros and Ones , 1957, Canadian Journal of Mathematics.

[43]  R. Taylor Contrained switchings in graphs , 1981 .

[44]  István Miklós,et al.  Approximate Counting of Graphical Realizations , 2015, PloS one.

[45]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[46]  Paul J. Laurienti,et al.  Exponential Random Graph Modeling for Complex Brain Networks , 2010, PloS one.

[47]  Martin Dillon,et al.  Runtime for performing exact tests on the p1 statistical model for random graphs , 2016 .

[48]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[49]  Vladimir Filkov,et al.  Exploring biological network structure using exponential random graph models , 2007, Bioinform..

[50]  Carolyn J. Anderson,et al.  A p* primer: logit models for social networks , 1999, Soc. Networks.

[51]  Akimichi Takemura,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Connecting Tables with Zero-One Entries by a Subset of a Markov Basis , 2009 .

[52]  Eric D. Kolaczyk,et al.  Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting the Foundations , 2017 .

[53]  Juyong Park,et al.  Solution for the properties of a clustered network. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: III. Valued relations , 1999 .

[55]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[56]  B. Sturmfels,et al.  Algebraic Algorithms for Sampling from Conditional Distributions Eye Color Black Brunette Red Blonde Total , 2022 .

[57]  K. Roberts,et al.  Thesis , 2002 .

[58]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[59]  M. Newman,et al.  Solution of the two-star model of a network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[61]  Elizabeth Gross,et al.  Goodness of fit for log-linear network models: dynamic Markov bases using hypergraphs , 2014, 1401.4896.

[62]  R. Tanaka,et al.  Scale-rich metabolic networks. , 2005, Physical review letters.

[63]  Stephen E. Fienberg,et al.  An Exponential Family of Probability Distributions for Directed Graphs: Comment , 1981 .

[64]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[65]  Caroline Uhler,et al.  Exact Goodness‐of‐Fit Testing for the Ising Model , 2014, 1410.1242.

[66]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[67]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[68]  Adrian Dobra,et al.  Dynamic Markov Bases , 2011, 1103.4891.

[69]  Edward T. Bullmore,et al.  The Multilayer Connectome of Caenorhabditis elegans , 2016, PLoS Comput. Biol..

[70]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .