Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network.

[1]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[2]  Ali Shojaie,et al.  Analysis of Gene Sets Based on the Underlying Regulatory Network , 2009, J. Comput. Biol..

[3]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[4]  F. Piano,et al.  Gene Clustering Based on RNAi Phenotypes of Ovary-Enriched Genes in C. elegans , 2002, Current Biology.

[5]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[6]  Rainer Spang,et al.  Non-transcriptional pathway features reconstructed from secondary effects of RNA interference , 2005, Bioinform..

[7]  Hiroaki Kitano,et al.  Biological robustness , 2008, Nature Reviews Genetics.

[8]  G. Michailidis,et al.  Network Enrichment Analysis in Complex Experiments , 2010, Statistical applications in genetics and molecular biology.

[9]  A. G. de la Fuente,et al.  From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis , 2010, PloS one.

[10]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[11]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[12]  Jayme Luiz Szwarcfiter,et al.  A Structured Program to Generate all Topological Sorting Arrangements , 1974, Information Processing Letters.

[13]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[14]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Ali Shojaie,et al.  Discovering graphical Granger causality using the truncating lasso penalty , 2010, Bioinform..

[17]  João Ricardo Sato,et al.  Modeling gene expression regulatory networks with the sparse vector autoregressive model , 2007, BMC Systems Biology.

[18]  Le Song,et al.  KELLER: estimating time-varying interactions between genes , 2009, Bioinform..

[19]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[20]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[21]  J. Stelling,et al.  Robustness of Cellular Functions , 2004, Cell.

[22]  Taro L. Saito,et al.  High-dimensional and large-scale phenotyping of yeast mutants. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[24]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[25]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[26]  Holger Fröhlich,et al.  Fast and efficient dynamic nested effects models , 2011, Bioinform..

[27]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[28]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[29]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[30]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[31]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[32]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[33]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[34]  R. Yoshida,et al.  Finding module-based gene networks with state-space models - Mining high-dimensional and short time-course gene expression data , 2007, IEEE Signal Processing Magazine.

[35]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[36]  Achim Tresch,et al.  Structure Learning in Nested Effects Models , 2007, Statistical applications in genetics and molecular biology.

[37]  Juan M. Vaquerizas,et al.  Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets , 2010, Nucleic acids research.

[38]  Patrick J. Killion,et al.  Genetic reconstruction of a functional transcriptional regulatory network , 2007, Nature Genetics.

[39]  G. Michailidis,et al.  Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. , 2013, Mathematical biosciences.

[40]  Robert J. Flassig,et al.  TRANSWESD: inferring cellular networks with transitive reduction , 2010, Bioinform..

[41]  Gang Li,et al.  R Functions for Sample Size and Probability Calculations for Assessing Consistency of Treatment Effects in Multi-Regional Clinical Trials , 2012 .

[42]  Achim Tresch,et al.  Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested effects models , 2009, Proceedings of the National Academy of Sciences.

[43]  Olga G. Troyanskaya,et al.  Nested effects models for high-dimensional phenotyping screens , 2007, ISMB/ECCB.

[44]  P. Kemmeren,et al.  Functional Overlap and Regulatory Links Shape Genetic Interactions between Signaling Pathways , 2010, Cell.