Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation

BackgroundThe data-driven inference of intracellular networks is one of the key challenges of computational and systems biology. As suggested by recent works, a simple yet effective approach for reconstructing regulatory networks comprises the following two steps. First, the observed effects induced by directed perturbations are collected in a signed and directed perturbation graph (PG). In a second step, Transitive Reduction (TR) is used to identify and eliminate those edges in the PG that can be explained by paths and are therefore likely to reflect indirect effects.ResultsIn this work we introduce novel variants for PG generation and TR, leading to significantly improved performances. The key modifications concern: (i) use of novel statistical criteria for deriving a high-quality PG from experimental data; (ii) the application of local TR which allows only short paths to explain (and remove) a given edge; and (iii) a novel strategy to rank the edges with respect to their confidence. To compare the new methods with existing ones we not only apply them to a recent DREAM network inference challenge but also to a novel and unprecedented synthetic compendium consisting of 30 5000-gene networks simulated with varying biological and measurement error variances resulting in a total of 270 datasets. The benchmarks clearly demonstrate the superior reconstruction performance of the novel PG and TR variants compared to existing approaches. Moreover, the benchmark enabled us to draw some general conclusions. For example, it turns out that local TR restricted to paths with a length of only two is often sufficient or even favorable. We also demonstrate that considering edge weights is highly beneficial for TR whereas consideration of edge signs is of minor importance. We explain these observations from a graph-theoretical perspective and discuss the consequences with respect to a greatly reduced computational demand to conduct TR. Finally, as a realistic application scenario, we use our framework for inferring gene interactions in yeast based on a library of gene expression data measured in mutants with single knockouts of transcription factors. The reconstructed network shows a significant enrichment of known interactions, especially within the 100 most confident (and for experimental validation most relevant) edges.ConclusionsThis paper presents several major achievements. The novel methods introduced herein can be seen as state of the art for inference techniques relying on perturbation graphs and transitive reduction. Another key result of the study is the generation of a new and unprecedented large-scale in silico benchmark dataset accounting for different noise levels and providing a solid basis for unbiased testing of network inference methodologies. Finally, applying our approach to Saccharomyces cerevisiae suggested several new gene interactions with high confidence awaiting experimental validation.

[1]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[2]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[3]  D. Lauffenburger,et al.  Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction , 2009, Molecular systems biology.

[4]  P. Brazhnik,et al.  Linking the genes: inferring quantitative gene networks from microarray data. , 2002, Trends in genetics : TIG.

[5]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[6]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[7]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[8]  Andrea Pinna,et al.  Bioinformatics Applications Note Systems Biology Simulating Systems Genetics Data with Sysgensim , 2022 .

[9]  Willem P. A. Ligtenberg,et al.  Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors , 2012, BMC Bioinformatics.

[10]  Annegret Wagler,et al.  Automatic reconstruction of molecular and genetic networks from discrete time series data , 2008, Biosyst..

[11]  Ralf Zimmer,et al.  Inferring gene regulatory networks by ANOVA , 2012, Bioinform..

[12]  Elie Dolgin,et al.  Mouse library set to be knockout , 2011, Nature.

[13]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[14]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[15]  Marcel J. T. Reinders,et al.  Linear Modeling of Genetic Networks from Experimental Data , 2000, ISMB.

[16]  Robert J. Flassig,et al.  TRANSWESD: inferring cellular networks with transitive reduction , 2010, Bioinform..

[17]  Patrick J. Killion,et al.  Genetic reconstruction of a functional transcriptional regulatory network , 2007, Nature Genetics.

[18]  Achim Tresch,et al.  Discrimination of Direct and Indirect Interactions in a Network of Regulatory Effects , 2007, J. Comput. Biol..

[19]  Carine Poussin Verification of systems biology research in the age of collaborative competition , 2012, BMC Proceedings.

[20]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[21]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[22]  Steffen Klamt,et al.  Computing paths and cycles in biological interaction graphs , 2009, BMC Bioinformatics.

[23]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[24]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[25]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[26]  Christophe Ambroise,et al.  Inferring multiple graphical structures , 2009, Stat. Comput..

[27]  Eduardo D. Sontag,et al.  NET-SYNTHESIS: a software for synthesis, inference and simplification of signal transduction networks , 2008, Bioinform..

[28]  Alvis Brazma,et al.  Current approaches to gene regulatory network modelling , 2007, BMC Bioinformatics.

[29]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[30]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[31]  Sandra Heise,et al.  An effective framework for reconstructing gene regulatory networks from genetical genomics data , 2013, Bioinform..

[32]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[33]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[34]  Eduardo Sontag,et al.  Untangling the wires: A strategy to trace functional interactions in signaling and gene networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Andreas Wagner,et al.  How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps , 2001, Bioinform..

[36]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[37]  Satoru Miyano,et al.  Identification of genetic networks by strategic gene disruptions and gene overexpressions under a boolean model , 2003, Theor. Comput. Sci..

[38]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[39]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[40]  C. Sander,et al.  Models from experiments: combinatorial drug perturbations of cancer cells , 2008, Molecular systems biology.

[41]  Tian Zheng,et al.  Inference of Regulatory Gene Interactions from Expression Data Using Three‐Way Mutual Information , 2009, Annals of the New York Academy of Sciences.

[42]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[43]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[44]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[45]  Juan M. Vaquerizas,et al.  Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets , 2010, Nucleic acids research.

[46]  T. Zhou,et al.  A Relative Variation-Based Method to Unraveling Gene Regulatory Networks , 2012, PloS one.

[47]  A. G. de la Fuente,et al.  From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis , 2010, PloS one.

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .