Testing biological network motif significance with exponential random graph models

Analysis of the structure of biological networks often uses statistical tests to establish the over-representation of motifs, which are thought to be important building blocks of such networks, related to their biological functions. However, there is disagreement as to the statistical significance of these motifs, and there are potential problems with standard methods for estimating this significance. Exponential random graph models (ERGMs) are a class of statistical model that can overcome some of the shortcomings of commonly used methods for testing the statistical significance of motifs. ERGMs were first introduced into the bioinformatics literature over 10 years ago but have had limited application to biological networks, possibly due to the practical difficulty of estimating model parameters. Advances in estimation algorithms now afford analysis of much larger networks in practical time. We illustrate the application of ERGM to both an undirected protein–protein interaction (PPI) network and directed gene regulatory networks. ERGM models indicate over-representation of triangles in the PPI network, and confirm results from previous research as to over-representation of transitive triangles (feed-forward loop) in an E. coli and a yeast regulatory network. We also confirm, using ERGMs, previous research showing that under-representation of the cyclic triangle (feedback loop) can be explained as a consequence of other topological features.

[1]  Rebecca J. Stones,et al.  Intrinsic limitations in mainstream methods of identifying network motifs in biology , 2020, BMC Bioinformatics.

[2]  Bruce A. Desmarais,et al.  Statistical Inference for Valued-Edge Networks: The Generalized Exponential Random Graph Model , 2011, PloS one.

[3]  Martin H. Schaefer,et al.  HIPPIE v2.0: enhancing meaningfulness and reliability of protein–protein interaction networks , 2016, Nucleic Acids Res..

[4]  Giovanni Micale,et al.  Establish the expected number of induced motifs on unlabeled graphs through analytical models , 2020, Appl. Netw. Sci..

[5]  Roland Eils,et al.  Bayesian statistical modelling of human protein interaction network incorporating protein disorder information , 2010, BMC Bioinformatics.

[6]  J. Ferrell Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability. , 2002, Current opinion in cell biology.

[7]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[8]  Weihua An,et al.  Fitting ERGMs on big networks. , 2016, Social science research.

[9]  Peng Wang,et al.  Closure, connectivity and degree distributions: Exponential random graph (p*) models for directed social networks , 2009, Soc. Networks.

[10]  Martina Morris,et al.  Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. , 2008, Journal of statistical software.

[11]  Michael A. Levy,et al.  gwdegree: Improving interpretation of geometrically-weighted degree estimates in exponential random graph models , 2016, J. Open Source Softw..

[12]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[13]  T. Snijders Enumeration and simulation methods for 0–1 matrices with given marginals , 1991 .

[14]  Michael Schweinberger,et al.  hergm: Hierarchical Exponential-Family Random Graph Models , 2018 .

[15]  M. Vergassola,et al.  An evolutionary and functional assessment of regulatory network motifs , 2005, Genome Biology.

[16]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[17]  A. Borisenko,et al.  A Simple Algorithm for Scalable Monte Carlo Inference , 2019, 1901.00533.

[18]  Michael Schweinberger,et al.  Consistent structure estimation of exponential-family random graph models with block structure , 2017, Bernoulli.

[19]  C. Butts Social network analysis: A methodological introduction , 2008 .

[20]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[21]  Pavel N. Krivitsky,et al.  Exponential-Family Models of Random Graphs: Inference in Finite-, Super-, and Infinite Population Scenarios , 2017 .

[22]  Anushya Muruganujan,et al.  PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API , 2020, Nucleic Acids Res..

[23]  Garry Robins,et al.  Missing data in networks: exponential random graph (p∗) models for networks with non-respondents , 2004, Soc. Networks.

[24]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM 2006.

[25]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[26]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[27]  Péter Csermely,et al.  Protein-Protein Interaction Networks , 2009, Encyclopedia of Database Systems.

[28]  Paul J. Laurienti,et al.  An exponential random graph modeling approach to creating group-based representative whole-brain connectivity networks , 2011, NeuroImage.

[29]  Gianmarc Grazioli,et al.  Network-Based Classification and Modeling of Amyloid Fibrils. , 2019, The journal of physical chemistry. B.

[30]  Marek S. Skrzypek,et al.  YPDTM, PombePDTM and WormPDTM: model organism volumes of the BioKnowledgeTM Library, an integrated resource for protein information , 2001, Nucleic Acids Res..

[31]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[32]  Martina Morris,et al.  ergm 4 . 0 : New features and improvements A Preprint , 2021 .

[33]  Tom A. B. Snijders,et al.  Exponential Random Graph Models for Social Networks , 2013 .

[34]  Bruce H. Mayhew,et al.  Baseline models of sociological phenomena , 1984 .

[35]  M. Gerstein,et al.  Genomic analysis of the hierarchical structure of regulatory networks , 2006, Proceedings of the National Academy of Sciences.

[36]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[37]  Pavel N. Krivitsky,et al.  Using contrastive divergence to seed Monte Carlo MLE for exponential-family random graph models , 2017, Comput. Stat. Data Anal..

[38]  Wilberforce Zachary Ouma,et al.  Topological and statistical analyses of gene regulatory networks reveal unifying yet quantitatively different emergent properties , 2018, PLoS Comput. Biol..

[39]  Colin S Gillespie,et al.  Fitting Heavy Tailed Distributions: The poweRlaw Package , 2014, 1407.3492.

[40]  Concettina Guerra,et al.  A review on models and algorithms for motif discovery in protein-protein interaction networks. , 2008, Briefings in functional genomics & proteomics.

[41]  L. Aravind,et al.  Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. , 2006, Journal of molecular biology.

[42]  B. C. L. Lehmann,et al.  Characterising group-level brain connectivity: A framework using Bayesian exponential random graph models , 2021, NeuroImage.

[43]  Sean R. Collins,et al.  Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile , 2005, Cell.

[44]  A. Rinaldo,et al.  CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS. , 2011, Annals of statistics.

[45]  Julio Collado-Vides,et al.  RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 , 2001, Nucleic Acids Res..

[46]  Marc-Thorsten Hütt,et al.  Artefacts in statistical analyses of network motifs: general framework and application to metabolic networks , 2012, Journal of The Royal Society Interface.

[47]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[48]  Carter T. Butts,et al.  Comparative Exploratory Analysis of Intrinsically Disordered Protein Dynamics Using Machine Learning and Network Analytic Methods , 2019, Front. Mol. Biosci..

[49]  Christopher C. Yang,et al.  Motif Discovery Algorithms in Static and Temporal Networks: A Survey , 2020, J. Complex Networks.

[50]  Pavel N Krivitsky,et al.  Computational Statistical Methods for Social Network Models , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[51]  Johan Koskinen,et al.  Exponential random graph model fundamentals , 2013 .

[52]  Martin H. Schaefer,et al.  Characterizing Protein Interactions Employing a Genome-Wide siRNA Cellular Phenotyping Screen , 2014, PLoS Comput. Biol..

[53]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[54]  Antonietta Mira,et al.  Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data , 2018, Scientific Reports.

[55]  Vladimir Batagelj,et al.  A subquadratic triad census algorithm for large sparse networks with small maximum degree , 2001, Soc. Networks.

[56]  Cassie McMillan,et al.  Dyads, triads, and tetrads: a multivariate simulation approach to uncovering network motifs in social graphs , 2021, Applied Network Science.

[57]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[58]  Peng Wang,et al.  Modelling a disease-relevant contact network of people who inject drugs , 2013, Soc. Networks.

[59]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[60]  Pavel N Krivitsky,et al.  A separable model for dynamic networks , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[61]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[62]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[63]  S. Mangan,et al.  Structure and function of the feed-forward loop network motif , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[64]  E. Ziv,et al.  Inferring network mechanisms: the Drosophila melanogaster protein interaction network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Alessandro Lomi,et al.  Social Network Modeling , 2018 .

[66]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM.

[67]  James Moody,et al.  Matrix methods for calculating the triad census , 1998 .

[68]  J. Stark,et al.  Network motifs: structure does not determine function , 2006, BMC Genomics.

[69]  Chiara Orsini,et al.  Quantifying randomness in real networks , 2015, Nature Communications.

[70]  Franck Picard,et al.  Assessing the Exceptionality of Network Motifs , 2007, J. Comput. Biol..

[71]  Joshua L. Payne,et al.  Function does not follow form in gene regulatory circuits , 2015, Scientific Reports.

[72]  Paul J. Laurienti,et al.  Exponential Random Graph Modeling for Complex Brain Networks , 2010, PloS one.

[73]  George T. Cantwell,et al.  Thresholding normally distributed data creates complex networks , 2019, Physical review. E.

[74]  Peng Wang,et al.  Snowball sampling for estimating exponential random graph models for large networks , 2016, Soc. Networks.

[75]  Fabrizio De Vico Fallani,et al.  A statistical model for brain networks inferred from large-scale electrophysiological signals , 2016, Journal of The Royal Society Interface.

[76]  Gang Wang,et al.  NetMODE: Network Motif Detection without Nauty , 2012, PloS one.

[77]  P. Pattison,et al.  Conditional estimation of exponential random graph models from snowball sampling designs , 2013 .

[78]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[79]  Julien Brailly,et al.  Exponential Random Graph Models for Social Networks , 2014 .

[80]  Christopher Steven Marcum,et al.  An Efficient Counting Method for the Colored Triad Census , 2018, Soc. Networks.

[81]  Katherine Faust,et al.  A puzzle concerning triads in social networks: Graph constraints and the triad census , 2010, Soc. Networks.

[82]  P. Holland,et al.  A Method for Detecting Structure in Sociometric Data , 1970, American Journal of Sociology.

[83]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[84]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[85]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[86]  Garry Robins,et al.  Exponential random graph model parameter estimation for very large directed networks , 2019, PloS one.

[87]  T. M. A. Fink,et al.  Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space , 2016, Journal of The Royal Society Interface.

[88]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[89]  Vladimir Filkov,et al.  Exploring biological network structure using exponential random graph models , 2007, Bioinform..

[90]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[91]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[92]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[93]  Michael Schweinberger,et al.  Large-scale estimation of random graph models with local dependence , 2020, Computational Statistics & Data Analysis.

[94]  Aaron Kershenbaum,et al.  Lasting impressions: motifs in protein-protein maps may provide footprints of evolutionary events. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[95]  Alfons Lawen,et al.  Bayesian model of signal rewiring reveals mechanisms of gene dysregulation in acquired drug resistance in breast cancer , 2017, PloS one.

[96]  Arun Siddharth Konagurthu,et al.  On the origin of distribution patterns of motifs in biological networks , 2008, BMC Systems Biology.

[97]  Elizabeth Gross,et al.  Random graphs with node and block effects: models, goodness-of-fit tests, and applications to biological networks , 2021 .

[98]  Arun S Konagurthu,et al.  Single and multiple input modules in regulatory networks , 2008, Proteins.

[99]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[100]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[101]  Garry Robins,et al.  Minimum distance estimators of population size from snowball samples using conditional estimation and scaling of exponential random graph models , 2017, Comput. Stat. Data Anal..

[102]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[103]  Giulio Cimini,et al.  The statistical physics of real-world networks , 2018, Nature Reviews Physics.

[104]  Alexander S. Szalay,et al.  The open connectome project data cluster: scalable analysis and vision for high-throughput neuroscience , 2013, SSDBM.

[105]  Zhong-Lin Lu,et al.  Statistical Modeling of the Default Mode Brain Network Reveals a Segregated Highway Structure , 2017, Scientific Reports.

[106]  Reid Ginoza,et al.  Network motifs come in sets: correlations in the randomization process. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[107]  Mark S Handcock,et al.  Local dependence in random graph models: characterization, properties and statistical inference , 2015, Journal of the American Statistical Association.

[108]  Mark S Handcock,et al.  Improving Simulation-Based Algorithms for Fitting ERGMs , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[109]  Yi Wang,et al.  Whole-animal connectomes of both Caenorhabditis elegans sexes , 2019, Nature.

[110]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[111]  Sarel J Fleishman,et al.  Comment on "Network Motifs: Simple Building Blocks of Complex Networks" and "Superfamilies of Evolved and Designed Networks" , 2004, Science.

[112]  S. Leinhardt,et al.  The Structure of Positive Interpersonal Relations in Small Groups. , 1967 .

[113]  Anushya Muruganujan,et al.  Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0) , 2019, Nature Protocols.

[114]  Cornelis J. Stam,et al.  Bayesian exponential random graph modeling of whole-brain structural networks across lifespan , 2016, NeuroImage.

[115]  Gaurav Kumar,et al.  Network analysis of human protein location , 2010, BMC Bioinformatics.

[116]  Agata Fronczak,et al.  Exponential random graph models for networks with community structure , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[117]  Pavel N. Krivitsky,et al.  Foundations of Finite-, Super-, and Infinite-Population Random Graph Inference , 2017 .

[118]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[119]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[120]  Piet Van Mieghem,et al.  Topology of molecular interaction networks , 2013, BMC Systems Biology.

[121]  Arthur M Lesk,et al.  Neighbourhoods in the yeast regulatory network in different physiological states , 2020, Bioinform..

[122]  A. Mira,et al.  Auxiliary Parameter MCMC for Exponential Random Graph Models , 2016, Journal of Statistical Physics.

[123]  Rebecca J. Stones,et al.  Intrinsic limitations in mainstream methods of identifying network motifs in biology , 2018, BMC Bioinformatics.

[124]  Publisher's Note , 2018, Anaesthesia.

[125]  Frank Emmert-Streib,et al.  Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance , 2020, Scientific Reports.

[126]  Louis K. Scheffer,et al.  A visual motion detection circuit suggested by Drosophila connectomics , 2013, Nature.

[127]  Athina Markopoulou,et al.  ergm.graphlets: A Package for ERG Modeling Based on Graphlet Statistics , 2014, ArXiv.

[128]  Garry Robins,et al.  Bayesian analysis for partially observed network data, missing ties, attributes and actors , 2013, Soc. Networks.

[129]  Pavel N Krivitsky,et al.  Exponential-family random graph models for valued networks. , 2011, Electronic journal of statistics.

[130]  Chiara Orsini,et al.  How random are complex networks , 2015, ArXiv.

[131]  Feng Xia,et al.  Motif discovery in networks: A survey , 2020, Comput. Sci. Rev..

[132]  Alberto Caimo,et al.  Bergm: Bayesian Exponential Random Graphs in R , 2012, 1201.2770.

[133]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[134]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[135]  Thomas E. Gorochowski,et al.  Organization of feed-forward loop motifs reveals architectural principles in natural and engineered networks , 2017, Science Advances.

[136]  Hongyu Zhao,et al.  Network Clustering Analysis Using Mixture Exponential-Family Random Graph Models and Its Application in Genetic Interaction Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[137]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[138]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[139]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[140]  Michael P H Stumpf,et al.  Complex networks and simple models in biology , 2005, Journal of The Royal Society Interface.

[141]  Kathleen M. Carley,et al.  The interaction of size and density with graph-level indices , 1999, Soc. Networks.

[142]  Sabyasachi Patra,et al.  Review of tools and algorithms for network motif discovery in biological networks , 2020, IET systems biology.

[143]  Yukiko Matsuoka,et al.  Adding Protein Context to the Human Protein-Protein Interaction Network to Reveal Meaningful Interactions , 2013, PLoS Comput. Biol..

[144]  Jay Bagga,et al.  Network motif identification and structure detection with exponential random graph models , 2014 .

[145]  Pedro T. Monteiro,et al.  Assessing regulatory features of the current transcriptional network of Saccharomyces cerevisiae , 2020, Scientific Reports.

[146]  Mark A. Ragan,et al.  BMC Systems Biology BioMed Central Research article Protein-protein interaction as a predictor of subcellular location , 2008 .

[147]  Martina Morris,et al.  Software Tools for the Statistical Analysis of Network Data , 2015 .