An inferential framework for biological network hypothesis tests

BackgroundNetworks are ubiquitous in modern cell biology and physiology. A large literature exists for inferring/proposing biological pathways/networks using statistical or machine learning algorithms. Despite these advances a formal testing procedure for analyzing network-level observations is in need of further development. Comparing the behaviour of a pharmacologically altered pathway to its canonical form is an example of a salient one-sample comparison. Locating which pathways differentiate disease from no-disease phenotype may be recast as a two-sample network inference problem.ResultsWe outline an inferential method for performing one- and two-sample hypothesis tests where the sampling unit is a network and the hypotheses are stated via network model(s). We propose a dissimilarity measure that incorporates nearby neighbour information to contrast one or more networks in a statistical test. We demonstrate and explore the utility of our approach with both simulated and microarray data; random graphs and weighted (partial) correlation networks are used to form network models. Using both a well-known diabetes dataset and an ovarian cancer dataset, the methods outlined here could better elucidate co-regulation changes for one or more pathways between two clinically relevant phenotypes.ConclusionsFormal hypothesis tests for gene- or protein-based networks are a logical progression from existing gene-based and gene-set tests for differential expression. Commensurate with the growing appreciation and development of systems biology, the dissimilarity-based testing methods presented here may allow us to improve our understanding of pathways and other complex regulatory systems. The benefit of our method was illustrated under select scenarios.

[1]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[2]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[3]  Ralf Steuer,et al.  Global Network Properties , 2007 .

[4]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[5]  Jaime Prat,et al.  Differential gene expression in ovarian tumors reveals Dusp 4 and Serpina 5 as key regulators for benign behavior of serous borderline tumors. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  M. Dehmer,et al.  Comprar Analysis of Microarray Data: A Network-Based Approach | Matthias Dehmer | 9783527318223 | Wiley , 2008 .

[7]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[8]  Lawrence Hubert,et al.  The Structural Representation of Proximity Matrices with MATLAB , 2006 .

[9]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[10]  Vladimir Filkov,et al.  Exploring biological network structure using exponential random graph models , 2007, Bioinform..

[11]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[12]  Mark Bieda,et al.  Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. , 2006, Genome research.

[13]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[14]  Frank Emmert-Streib,et al.  The Chronic Fatigue Syndrome: A Comparative Pathway Analysis , 2007, J. Comput. Biol..

[15]  Marti J. Anderson,et al.  Distance‐Based Tests for Homogeneity of Multivariate Dispersions , 2006, Biometrics.

[16]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[17]  Béla Bollobás,et al.  Modern Graph Theory , 2002, Graduate Texts in Mathematics.

[18]  F. Pesarin Multivariate Permutation Tests : With Applications in Biostatistics , 2001 .

[19]  Marc-Thorsten Hütt,et al.  Consistency analysis of metabolic correlation networks , 2007, BMC Systems Biology.

[20]  Luca Cardelli,et al.  Abstract Machines of Systems Biology , 2005, Trans. Comp. Sys. Biology.

[21]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[22]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[23]  Michael P. H. Stumpf,et al.  Generating confidence intervals on biological networks , 2007, BMC Bioinformatics.

[24]  John Skvoretz,et al.  8. Comparing Networks across Space and Time, Size and Species , 2002 .

[25]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[26]  Falk Schreiber,et al.  Analysis of Biological Networks , 2008 .

[27]  Xujing Wang,et al.  TAPPA: topological analysis of pathway phenotype association , 2007, Bioinform..

[28]  Jian-Bing Fan,et al.  Analysis of gene expression in stage I serous tumors identifies critical pathways altered in ovarian cancer. , 2009, Gynecologic oncology.

[29]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[30]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[31]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[32]  M. Stumpf,et al.  A likelihood approach to analysis of network data , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations (Lecture Notes in Computer Science) , 2005 .

[34]  Stefan Bornholdt,et al.  Handbook of Graphs and Networks: From the Genome to the Internet , 2003 .

[35]  Robert Clarke,et al.  Differential dependency network analysis to identify condition-specific topological changes in biological networks , 2009, Bioinform..

[36]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[37]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations , 2010 .

[38]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[39]  Pascal Kahlem,et al.  ENFIN—a Network to Enhance Integrative Systems Biology , 2007, Annals of the New York Academy of Sciences.

[40]  Johannes Jaeger,et al.  Parameter estimation and determinability analysis applied to Drosophila gap gene circuits , 2008, BMC Systems Biology.

[41]  Matthias Dehmer,et al.  Analysis of Microarray Data , 2008 .

[42]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[43]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[44]  V W Berger,et al.  Pros and cons of permutation tests in clinical trials. , 2000, Statistics in medicine.

[45]  K. Helin,et al.  E2F target genes: unraveling the biology. , 2004, Trends in biochemical sciences.

[46]  Steve Horvath,et al.  Network neighborhood analysis with the multi-node topological overlap measure , 2007, Bioinform..

[47]  Gregory M. Constantine,et al.  Metric Models for Random Graphs , 1998 .

[48]  Luonan Chen,et al.  Biomolecular Networks: Methods and Applications in Systems Biology , 2009 .

[49]  Kim Sneppen,et al.  Functional Alignment of Regulatory Networks: A Study of Temperate Phages , 2005, PLoS Comput. Biol..

[50]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[51]  Kathleen M. Carley,et al.  Models for evolving fixed node networks: model fitting and model testing , 1995 .

[52]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[53]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[54]  Ted G. Lewis,et al.  Network Science: Theory and Applications , 2009 .

[55]  Hongzhe Li,et al.  Co-expression networks: graph properties and topological comparisons , 2010, Bioinform..

[56]  M. van Engeland,et al.  E2Fs mediate a fundamental cell‐cycle deregulation in high‐grade serous ovarian carcinomas , 2009, The Journal of pathology.

[57]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[58]  T. Perkins The Gap Gene System of Drosophila melanogaster , 2007, Annals of the New York Academy of Sciences.

[59]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[60]  杨凌春,et al.  Broad Institute , 2014 .

[61]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[62]  Jun Dong,et al.  Understanding network concepts in modules , 2007, BMC Systems Biology.

[63]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[64]  Kathleen M. Carley,et al.  Metric inference for social networks , 1994 .

[65]  Martina Morris,et al.  A statnet Tutorial. , 2008, Journal of statistical software.

[66]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[67]  M. Dehmer,et al.  Analysis of Microarray Data: A Network-Based Approach , 2008 .

[68]  M. Xiong,et al.  Identification of genetic networks. , 2004, Genetics.

[69]  Phillip D. Yates,et al.  An inferential framework for network hypothesis tests: With applications to biological networks , 2010 .

[70]  Susmita Datta,et al.  A statistical framework for differential network analysis from microarray data , 2010, BMC Bioinformatics.

[71]  P. Diaconis,et al.  Matchings and phylogenetic trees. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Christian V. Forst,et al.  Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation , 2006, BMC Bioinformatics.

[73]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[74]  C. D. Cutler,et al.  A REVIEW OF THE THEORY AND ESTIMATION OF FRACTAL DIMENSION , 1993 .

[75]  Jens Nielsen,et al.  Architecture of transcriptional regulatory circuits is knitted over the topology of bio-molecular interaction networks , 2008, BMC Systems Biology.

[76]  G. Jogesh Babu,et al.  Multivariate Permutation Tests , 2002, Technometrics.

[77]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[78]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[79]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[80]  S. Horvath,et al.  Weighted gene coexpression network analysis strategies applied to mouse weight , 2007, Mammalian Genome.