An Information Theoretic Approach to Reverse Engineering of Regulatory Gene Networks from Time-Course Data

One of main aims of Molecular Biology is the gain of knowledge about how molecular components interact each other and to understand gene function regulations. Several methods have been developed to infer gene networks from steady-state data, much less literature is produced about time-course data, so the development of algorithms to infer gene networks from time-series measurements is a current challenge into bioinformatics research area. In order to detect dependencies between genes at different time delays, we propose an approach to infer gene regulatory networks from time-series measurements starting from a well known algorithm based on information theory. In particular, we show how the ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) algorithm can be used for gene regulatory network inference in the case of time-course expression profiles. The resulting method is called TimeDelay-ARACNE. It just tries to extract dependencies between two genes at different time delays, providing a measure of these dependencies in terms of mutual information. The basic idea of the proposed algorithm is to detect time-delayed dependencies between the expression profiles by assuming as underlying probabilistic model a stationary Markov Random Field. Less informative dependencies are filtered out using an auto calculated threshold, retaining most reliable connections. TimeDelay-ARACNE can infer small local networks of time regulated gene-gene interactions detecting their versus and also discovering cyclic interactions also when only a medium-small number of measurements are available. We test the algorithm both on synthetic networks and on microarray expression profiles. Microarray measurements are concerning part of S. cerevisiae cell cycle and E. coli SOS pathways. Our results are compared with the ones of two previously published algorithms: Dynamic Bayesian Networks and systems of ODEs, showing that TimeDelay-ARACNE has good accuracy, recall and F-score for the network reconstruction task.

[1]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[3]  Xiaohong Chen,et al.  Estimation of Copula-Based Semiparametric Time Series Models , 2006 .

[4]  Kwang-Hyun Cho,et al.  Inferring gene regulatory networks from temporal expression profiles under time-delay and noise , 2007, Comput. Biol. Chem..

[5]  Ron Shamir,et al.  Modeling and Analysis of Heterogeneous Regulation in Biological Networks , 2004, Regulatory Genomics.

[6]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[7]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  Tianzi Jiang,et al.  Characterizing the dynamic connectivity between genes by variable parameter regression and Kalman filtering based on temporal gene expression data , 2005, Bioinform..

[10]  Tao Jiang,et al.  OligoSpawn: a software tool for the design of overgo probes from large unigene datasets , 2006, BMC Bioinformatics.

[11]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[12]  U. Alon,et al.  Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Nelsen An Introduction to Copulas , 1998 .

[14]  Edward R. Dougherty,et al.  Inferring gene regulatory networks from time series data using the minimum description length principle , 2006, Bioinform..

[15]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[16]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[17]  Grace S. Shieh,et al.  A pattern recognition approach to infer time-lagged genetic interactions , 2008, Bioinform..

[18]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[19]  Alexander Schliep,et al.  Using hidden Markov models to analyze gene expression time course data , 2003, ISMB.

[20]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[21]  Ben Lehner,et al.  A simple principle concerning the robustness of protein complex activity to changes in gene expression , 2008 .

[22]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[23]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[24]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Mark P. Styczynski,et al.  A generic motif discovery algorithm for sequential data. , 2006, Bioinformatics.

[27]  Ralf Herwig,et al.  T2DM-GeneMiner a web resource for meta-analysis and marker identification for type 2 diabetes mellitus , 2007, BMC Bioinformatics.

[28]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..

[29]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[30]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[31]  Eran Stark,et al.  Partial cross-correlation analysis resolves ambiguity in the encoding of multiple movement features. , 2006, Journal of neurophysiology.

[32]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[34]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[35]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[37]  Farren J. Isaacs,et al.  Computational studies of gene regulatory networks: in numero molecular biology , 2001, Nature Reviews Genetics.

[38]  Li Li,et al.  BMC Bioinformatics Methodology article Discovery of time-delayed gene regulatory networks based on temporal , 2006 .

[39]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[40]  Katsuhisa Horimoto,et al.  BMC Systems Biology BioMed Central Methodology article , 2008 .

[41]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..