Constrained Covariance Matrices With a Biologically Realistic Structure: Comparison of Methods for Generating High-Dimensional Gaussian Graphical Models

High-dimensional data from molecular biology possess an intricate correlation structure that is imposed by the molecular interactions between genes and their products forming various different types of gene networks. This fact is particularly well known for gene expression data, because there is a sufficient number of large-scale data sets available that are amenable for a sensible statistical analysis confirming this assertion. The purpose of this paper is two fold. First, we investigate three methods for generating constrained covariance matrices with a biologically realistic structure. Such covariance matrices are playing a pivotal role in designing novel statistical methods for high-dimensional biological data, because they allow to define Gaussian graphical models (GGM) for the simulation of realistic data; including their correlation structure. We study local and global characteristics of these covariance matrices, and derived concentration/partial correlation matrices. Second, we connect these results, obtained from a probabilistic perspective, to statistical results of studies aiming to estimate gene regulatory networks from biological data. This connection allows to shed light on the well-known heterogeneity of statistical estimation methods for inferring gene regulatory networks and provides an explanation for the difficulties inferring molecular interactions between highly connected genes.

[1]  Frank Emmert-Streib,et al.  Assessment Method for a Power Analysis to Identify Differentially Expressed Pathways , 2012, PloS one.

[2]  J. Yates Mass spectral analysis in proteomics. , 2004, Annual review of biophysics and biomolecular structure.

[3]  Kyung In Kim,et al.  Effects of dependence in high-dimensional multiple testing problems , 2008, BMC Bioinformatics.

[4]  I. Goryanin,et al.  Human metabolic network reconstruction and its impact on drug discovery and development. , 2008, Drug discovery today.

[5]  G. Beadle,et al.  Genetic Control of Biochemical Reactions in Neurospora , 1941 .

[6]  David Kipling,et al.  Normality of oligonucleotide microarray data and implications for parametric statistical analyses , 2003, Bioinform..

[7]  F Emmert-Streib,et al.  Local network-based measures to assess the inferability of different regulatory networks. , 2010, IET systems biology.

[8]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[9]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[10]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[11]  Matthias Dehmer,et al.  Interfacing cellular networks of S. cerevisiae and E. coli: Connecting dynamic and genetic information , 2013, BMC Genomics.

[12]  Andreas Holzinger,et al.  Functional and genetic analysis of the colon cancer network , 2014, BMC Bioinformatics.

[13]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[14]  Frank Emmert-Streib,et al.  GSAR: Bioconductor package for Gene Set analysis in R , 2017, BMC Bioinformatics.

[15]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[16]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[17]  A. Hartemink Reverse engineering gene regulatory networks , 2005, Nature Biotechnology.

[18]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[19]  K. Tan,et al.  Understanding transcriptional regulatory networks using computational models. , 2016, Current opinion in genetics & development.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[22]  A. Rapoport,et al.  Connectivity of random nets , 1951 .

[23]  A. Pertsemlidis,et al.  Bayesian Statistical Studies of the Ramachandran Distribution , 2005, Statistical applications in genetics and molecular biology.

[24]  Matthias Dehmer,et al.  sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters , 2017, BMC Bioinformatics.

[25]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[26]  E. Lander The New Genomics: Global Views of Biology , 1996, Science.

[27]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[28]  Matthias Dehmer,et al.  Applied Statistics for Network Biology: Methods in Systems Biology , 2011 .

[29]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[30]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[31]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[32]  Giuseppe Tradigo,et al.  Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks , 2014, BMC Bioinformatics.

[33]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[34]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[35]  Benjamin Haibe-Kains,et al.  Untangling statistical and biological models to understand network inference: the need for a genomics network ontology , 2014, Front. Genet..

[36]  Frank Emmert-Streib,et al.  Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks , 2011, PloS one.

[37]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[38]  M. Bartlett XX.—On the Theory of Statistical Regression. , 1934 .

[39]  F Emmert-Streib,et al.  Networks for systems biology: conceptual connection of data and function. , 2011, IET systems biology.

[40]  Frank Emmert-Streib,et al.  Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors , 2013, PeerJ.

[41]  Robert Castelo,et al.  Reverse Engineering Molecular Regulatory Networks from Microarray Data with qp-Graphs , 2009, J. Comput. Biol..

[42]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[43]  G. Altay,et al.  Structural influence of gene networks on their inference: analysis of C3NET. , 2011 .

[44]  Frank Emmert-Streib,et al.  Bagging Statistical Network Inference from Large-Scale Gene Expression Data , 2012, PloS one.

[45]  Claudio Cobelli,et al.  ABACUS: an entropy-based cumulative bivariate statistic robust to rare variants and different direction of genotype effect , 2014, Bioinform..

[46]  J. Nicholson Global systems biology, personalized medicine and molecular epidemiology , 2006, Molecular systems biology.

[47]  Matthias Dehmer,et al.  NetBioV: an R package for visualizing large network data in biology and medicine , 2014, Bioinform..

[48]  Hao Wang,et al.  Scaling It Up: Stochastic Search Structure Learning in Graphical Models , 2015, 1505.01687.

[49]  Frank Emmert-Streib,et al.  Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets , 2009, Bioinform..

[50]  Matthias Dehmer,et al.  Information processing in the transcriptional regulatory network of yeast: Functional robustness , 2009, BMC Systems Biology.

[51]  B. Snel,et al.  The yeast coexpression network has a small‐world, scale‐free architecture and can be explained by a simple model , 2004, EMBO reports.

[52]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[53]  Ernesto Estrada Spectral theory of networks : from biomolecular to ecological systems , 2009 .

[54]  Yasir Rahmatallah,et al.  Gene set analysis for self-contained tests: complex null and specific alternative hypotheses , 2012, Bioinform..

[55]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[56]  Frank Emmert-Streib,et al.  Organizational structure and the periphery of the gene regulatory network in B-cell lymphoma , 2012, BMC Systems Biology.

[57]  Galina V. Glazko,et al.  Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data , 2012, Front. Gene..