Transcriptional Regulatory Networks across Species - Evolution, Inference, and Refinement

The determination of transcriptional regulatory networks is key to the understanding of biological systems. However, the experimental determination of transcriptional regulatory networks in the laboratory remains difficult and time-consuming, while current computational methods to infer these networks (typically from gene-expression data) achieve only modest accuracy. The latter can be attributed in part to the limitations of a single-organism approach. Computational biology has long used comparative and, more generally, evolutionary approaches to extend the reach and accuracy of its analyses. We therefore use an evolutionary approach to the inference of regulatory networks, which enables us to study evolutionary models for these networks as well as to improve the accuracy of inferred networks. Since the regulatory networks evolve along with the genomes, we consider that the regulatory networks for a family of organisms are related to each other through the same phylogenetic tree. These relationships contain information that can be used to improve the accuracy of inferred networks. Advances in the study of evolution of regulatory networks provide evidence to establish evolutionary models for regulatory networks, which is an important component of our evolutionary approach. We use two network evolutionary models, a basic model that considers only the gains and losses of regulatory connections during evolution, and an extended model that also takes into account the duplications and losses of genes. With the network evolutionary models, we design refinement algorithms to make use of the phylogenetic relationships to refine noisy regulatory networks for a family of organisms. These refinement algorithms include: RefineFast and RefineML, which are two-step iterative algorithms, and ProPhyC and ProPhyCC, which are based on a probabilistic phylogenetic model. For each algorithm we first design it with the basic network evolutionary model and then generalize it to the extended evolutionary model. All these algorithms are computationally efficient and are supported by extensive experimental results showing that they yield substantial improvement in the quality of the input noisy networks. In particular, ProPhyC and ProPhyCC further improve the performance of RefineFast and RefineML. Besides the four refinement algorithms mentioned above, we also design an algorithm based on transfer learning theory called tree transfer learning (TTL). TTL differs from the previous four refinement algorithms in the sense that it takes the gene-expression data for the family of organisms as input, instead of their inferred noisy networks. TTL then learns the network structures for all the organisms at once, meanwhile taking advantage of the phylogenetic relationships. Although this approach outperforms an inference algorithm used alone, it does not perform better than ProPhyC, which indicates that the ProPhyC framework makes good use of the phylogenetic information. keywords: regulatory networks, network inference, evolution, phylogenetic relationships, ancestral network, refinement, gene duplication, evolutionary model, evolutionary history, reconciliation, orthology, maximum likelihood, transfer learning

[1]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[2]  U. Gophna,et al.  Analysis of Coevolving Gene Families Using Mutually Exclusive Orthologous Modules , 2011, Genome biology and evolution.

[3]  Xiuwei Zhang,et al.  Improving Inference of Transcriptional Regulatory Networks Based on Network Evolutionary Models , 2009, WABI.

[4]  P. Bork,et al.  Evolution of biomolecular networks — lessons from metabolic and protein interactions , 2009, Nature Reviews Molecular Cell Biology.

[5]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[6]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[7]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[8]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[9]  A. Wagner,et al.  Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications , 2002, BMC Evolutionary Biology.

[10]  M Madan Babu,et al.  Early Career Research Award Lecture. Structure, evolution and dynamics of transcriptional regulatory networks. , 2010, Biochemical Society transactions.

[11]  Xiuwei Zhang,et al.  Boosting the Performance of Inference Algorithms for Transcriptional Regulatory Networks Using a Phylogenetic Approach , 2008, WABI.

[12]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[13]  Tandy J. Warnow,et al.  Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics , 2000, Experimental Algorithmics.

[14]  G. Wagner,et al.  The road to modularity , 2007, Nature Reviews Genetics.

[15]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[18]  S. Teichmann,et al.  Gene regulatory network growth by duplication , 2004, Nature Genetics.

[19]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  Emma J. Cooke,et al.  Computational approaches to the integration of gene expression, ChIP-chip and sequence data in the inference of gene regulatory networks. , 2009, Seminars in cell & developmental biology.

[21]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[22]  Adam P. Arkin,et al.  Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes , 2007, PLoS Comput. Biol..

[23]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[24]  Alvis Brazma,et al.  Current approaches to gene regulatory network modelling , 2007, BMC Bioinformatics.

[25]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[26]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[27]  Luonan Chen,et al.  Inferring transcriptional regulatory networks from high-throughput data , 2007, Bioinform..

[28]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[29]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[30]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[31]  Xiuwei Zhang,et al.  ProPhyC: A Probabilistic Phylogenetic Model for Refining Regulatory Networks , 2011, ISBRA.

[32]  Dannie Durand,et al.  A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction , 2005, RECOMB.

[33]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[34]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[35]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[36]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[37]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[38]  Anton Crombach,et al.  Evolution of Evolvability in Gene Regulatory Networks , 2008, PLoS Comput. Biol..

[39]  Nitin Bhardwaj,et al.  Rewiring of Transcriptional Regulatory Networks: Hierarchy, Rather Than Connectivity, Better Reflects the Importance of Regulators , 2010, Science Signaling.

[40]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[41]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[42]  S. Teichmann,et al.  Evolutionary dynamics of prokaryotic transcriptional regulatory networks. , 2006, Journal of molecular biology.

[43]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[44]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[45]  David Sankoff,et al.  Improving Gene Network Inference by Comparing Expression Time-series across Species, Developmental Stages or Tissues , 2004, J. Bioinform. Comput. Biol..

[46]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[47]  Roger C. Conant EXTENDED DEPENDENCY ANALYSIS OF LARGE SYSTEMS , 1988 .

[48]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[49]  D. Hillis Approaches for Assessing Phylogenetic Accuracy , 1995 .

[50]  Xiuwei Zhang,et al.  Refining transcriptional regulatory networks using network evolutionary models and gene histories , 2010, Algorithms for Molecular Biology.

[51]  Jotun Hein,et al.  A Bayesian Approach to the Evolution of Metabolic Networks on a Phylogeny , 2010, PLoS Comput. Biol..

[52]  M. Gerstein,et al.  Structure and evolution of transcriptional regulatory networks. , 2004, Current opinion in structural biology.

[53]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[54]  Aviv Regev,et al.  Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets , 2009, Science.

[55]  Xiuwei Zhang,et al.  Using Phylogenetic Relationships to Improve the Inference of Transcriptional Regulatory Networks , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[56]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[57]  Edward R. Dougherty,et al.  Inferring gene regulatory networks from time series data using the minimum description length principle , 2006, Bioinform..

[58]  Bengt Sennblad,et al.  Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution , 2004, RECOMB.

[59]  Saurabh Sinha,et al.  Evolution of Regulatory Sequences in 12 Drosophila Species , 2009, PLoS genetics.

[60]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[61]  A. Regev,et al.  Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[63]  Manolis Kellis,et al.  Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[64]  David J. Galas,et al.  A duplication growth model of gene expression networks , 2002, Bioinform..

[65]  Andreas Wagner,et al.  Molecular evolution in the yeast transcriptional regulation network. , 2004, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[66]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[67]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[68]  Mona Singh,et al.  Toward the dynamic interactome: it's about time , 2010, Briefings Bioinform..

[69]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[70]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[71]  Lars Arvestad,et al.  Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. , 2007, Journal of experimental zoology. Part B, Molecular and developmental evolution.