Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models

BackgroundIdentifying large gene regulatory networks is an important task, while the acquisition of data through perturbation experiments (e.g., gene switches, RNAi, heterozygotes) is expensive. It is thus desirable to use an identification method that effectively incorporates available prior knowledge – such as sparse connectivity – and that allows to design experiments such that maximal information is gained from each one.ResultsOur main contributions are twofold: a method for consistent inference of network structure is provided, incorporating prior knowledge about sparse connectivity. The algorithm is time efficient and robust to violations of model assumptions. Moreover, we show how to use it for optimal experimental design, reducing the number of required experiments substantially. We employ sparse linear models, and show how to perform full Bayesian inference for these. We not only estimate a single maximum likelihood network, but compute a posterior distribution over networks, using a novel variant of the expectation propagation method. The representation of uncertainty enables us to do effective experimental design in a standard statistical setting: experiments are selected such that the experiments are maximally informative.ConclusionFew methods have addressed the design issue so far. Compared to the most well-known one, our method is more transparent, and is shown to perform qualitatively superior. In the former, hard and unrealistic constraints have to be placed on the network structure for mere computational tractability, while such are not required in our method. We demonstrate reconstruction and optimal experimental design capabilities on tasks generated from realistic non-linear network simulators.The methods described in the paper are available as a Matlab package athttp://www.kyb.tuebingen.mpg.de/sparselinearmodel.

[1]  Kwang-Hyun Cho,et al.  Identification of small scale biochemical networks based on general type system perturbations , 2005, The FEBS journal.

[2]  Tommi S. Jaakkola,et al.  Bayesian Methods for Elucidating Genetic Regulatory Networks , 2002, IEEE Intell. Syst..

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[5]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[6]  Eduardo Sontag,et al.  Untangling the wires: A strategy to trace functional interactions in signaling and gene networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Gregory F. Cooper,et al.  A Computer-Based Microarray Experiment Design-System for Gene-Regulation Pathway Discovery , 2003, AMIA.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[10]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[11]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[12]  D. A. Baxter,et al.  Mathematical Modeling of Gene Networks , 2000, Neuron.

[13]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[14]  Tommi S. Jaakkola,et al.  Bayesian Methods for Elucidating Genetic Regulatory Networks , 2002, IEEE Intell. Syst..

[15]  Simon Rogers,et al.  A Bayesian regression approach to the inference of regulatory networks from gene expression data , 2005, Bioinform..

[16]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[17]  G. Odell,et al.  The segment polarity network is a robust developmental module , 2000, Nature.

[18]  J. Collins,et al.  Construction of a genetic toggle switch in Escherichia coli , 2000, Nature.

[19]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[21]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Ralf Peeters,et al.  On the identification of sparse gene regulatory networks , 2004 .

[23]  Jean-Philippe Vert,et al.  An accurate and interpretable model for siRNA efficacy prediction , 2006, BMC Bioinformatics.

[24]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[25]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[26]  Eduardo D. Sontag,et al.  Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data , 2004, Bioinform..

[27]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[28]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Fire,et al.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans , 1998, Nature.

[30]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[31]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .