Construction of regulatory networks using expression time-series data of a genotyped population

The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene–gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

[1]  E. Winter,et al.  A DNA binding protein that recognizes oligo(dA).oligo(dT) tracts. , 1989, The EMBO journal.

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[4]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[5]  B. Rothermel,et al.  A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus , 1997, Molecular and cellular biology.

[6]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[7]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[8]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[9]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[10]  B. André,et al.  Transcriptional Induction by Aromatic Amino Acids in Saccharomyces cerevisiae , 1999, Molecular and Cellular Biology.

[11]  A. Raftery,et al.  Bayesian Information Criterion for Censored Survival Models , 2000, Biometrics.

[12]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[13]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[14]  Michael E. Cusick,et al.  The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information , 2000, Nucleic Acids Res..

[15]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[16]  George M. Furnival,et al.  Regressions by leaps and bounds , 2000 .

[17]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[18]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[19]  S. Richardson,et al.  Variable selection and Bayesian model averaging in case‐control studies , 2001, Statistics in medicine.

[20]  T. Ideker,et al.  A new approach to decoding life: systems biology. , 2001, Annual review of genomics and human genetics.

[21]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[22]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[23]  A. Raftery,et al.  Discussion: Performance of Bayesian Model Averaging , 2003 .

[24]  Roger E Bumgarner,et al.  From co-expression to co-regulation: how many microarray experiments do we need? , 2004, Genome Biology.

[25]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[26]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[27]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[28]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[29]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[30]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[31]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[33]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[34]  T. Ideker,et al.  Supporting Online Material for A Systems Approach to Mapping DNA Damage Response Pathways , 2006 .

[35]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[36]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[37]  Christian J. Stoeckert,et al.  Bayesian variable selection and data integration for biological regulatory networks , 2006, math/0610034.

[38]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[39]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[40]  Adrian E. Raftery,et al.  Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data , 2009, BMC Bioinformatics.

[41]  Leopold Parts,et al.  Population genomics of domestic and wild yeasts , 2008 .

[42]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[43]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.

[44]  A. Dobra Variable selection and dependency networks for genomewide data. , 2009, Biostatistics.

[45]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[46]  Daniel E. Newburger,et al.  High-resolution DNA-binding specificity analysis of yeast transcription factors. , 2009, Genome research.

[47]  Leonid Kruglyak,et al.  Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae , 2009, Nature.

[48]  E. Schadt Molecular networks as sensors and drivers of common human diseases , 2009, Nature.

[49]  Chris Hans,et al.  Model uncertainty and variable selection in Bayesian lasso regression , 2010, Stat. Comput..

[50]  Ji Zhu,et al.  Sparse Regulatory Networks. , 2010, The annals of applied statistics.

[51]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[52]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[53]  C. K. Hsiao,et al.  Extended Bayesian Model Averaging in Generalized Linear Mixed Models Applied to Schizophrenia Family Data , 2011, Annals of human genetics.