A Statistical Method for Constructing Transcriptional Regulatory Networks Using Gene Expression and Sequence Data

Transcriptional regulation is one of the most important means of gene regulation. Uncovering transcriptional regulatory networks helps us to understand the complex cellular process. In this paper, we describe a statistical approach for constructing transcriptional regulatory networks using data of gene expression, promoter sequence, and transcription factor binding sites. Our simulation studies show that the overall and false positive error rates in the estimated transcriptional regulatory networks are expected to be small if the systematic noise in the constructed feature matrix is small. Our analysis based on 658 microarray experiments on yeast gene expression programs and 46 transcription factors suggests that the method is capable of identifying significant transcriptional regulatory interactions and uncovering the corresponding regulatory network structures.

[1]  A. Vershon,et al.  The pachytene checkpoint in Saccharomyces cerevisiae requires the Sum1 transcriptional repressor , 2000, The EMBO journal.

[2]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[3]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[4]  S. Dudoit,et al.  Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples , 2003 .

[5]  D. Stillman,et al.  Distinct Regions of the Swi5 and Ace2 Transcription Factors Are Required for Specific Gene Activation* , 1999, The Journal of Biological Chemistry.

[6]  B. Tye,et al.  Mcm7, a Subunit of the Presumptive MCM Helicase, Modulates Its Own Expression in Conjunction with Mcm1* , 2003, Journal of Biological Chemistry.

[7]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[8]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[9]  S. Keleş,et al.  Statistical Applications in Genetics and Molecular Biology Asymptotic Optimality of Likelihood-Based Cross-Validation , 2011 .

[10]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[11]  Michael B. Eisen,et al.  Identification of regulatory elements using a feature selection method , 2002, Bioinform..

[12]  Youyong Zhu,et al.  Genetic diversity and disease control in rice , 2000, Nature.

[13]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[14]  L. Johnston,et al.  Rme1, a negative regulator of meiosis, is also a positive activator of G1 cyclin gene expression. , 1995, The EMBO journal.

[15]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[17]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[18]  T. Cooper,et al.  Review: compilation and characteristics of dedicated transcription factors in Saccharomyces cerevisiae. , 1995, Yeast.

[19]  Terrance G. Cooper,et al.  Complilation and characteristics of dedicated transcription factors in Saccharomyces cerevisiae , 1995 .

[20]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[21]  Sandrine Dudoit,et al.  Asymptotics of Cross-Validated Risk Estimation in Model Selection and Performance Assessment , 2003 .

[22]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[23]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[24]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[25]  J. Pak,et al.  Regulation of the Premiddle and Middle Phases of Expression of the NDT80 Gene during Sporulation of Saccharomyces cerevisiae , 2002, Molecular and Cellular Biology.

[26]  James I. Garrels,et al.  The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data , 1999, Nucleic Acids Res..

[27]  D. Botstein,et al.  Systematic changes in gene expression patterns following adaptive evolution in yeast. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[29]  J. Deckert,et al.  Multiple elements and auto-repression regulate Rox1, a repressor of hypoxic genes in Saccharomyces cerevisiae. , 1995, Genetics.

[30]  P. Brown,et al.  Identification of the Copper Regulon in Saccharomyces cerevisiae by DNA Microarrays* , 2000, The Journal of Biological Chemistry.

[31]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[32]  J. Boros,et al.  Molecular determinants of the cell-cycle regulated Mcm1p-Fkh2p transcription factor complex. , 2003, Nucleic acids research.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Claude Jacq,et al.  Isolation and molecular characterization of the carboxy-terminal pdr3 mutants in Saccharomyces cerevisiae , 2000, Current Genetics.

[35]  I. Graham,et al.  A Reb1p‐binding site is required for efficient activation of the yeast RAP1 gene, but multiple binding sites for Rap1p are not essential , 1994, Molecular microbiology.

[36]  S. Chu,et al.  Gametogenesis in yeast is regulated by a transcriptional cascade dependent on Ndt80. , 1998, Molecular cell.

[37]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[38]  R. Somogyi,et al.  The gene expression matrix: towards the extraction of genetic network architectures , 1997 .

[39]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[40]  S. Dudoit,et al.  Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives , 2004 .

[41]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[42]  A. Nordheim,et al.  Mcm1 is required to coordinate G2-specific transcription in Saccharomyces cerevisiae , 1995, Molecular and cellular biology.

[43]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[44]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[45]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[46]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[47]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[48]  C. Jacq,et al.  Positive autoregulation of the yeast transcription factor Pdr3p, which is involved in control of drug resistance , 1995, Molecular and cellular biology.

[49]  P. Brown,et al.  New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis. , 2000, Molecular biology of the cell.

[50]  P. Brown,et al.  Whole-genome expression analysis of snf/swi mutants of Saccharomyces cerevisiae. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[52]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[53]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[54]  David Botstein,et al.  A systematic approach to reconstructing transcription networks in Saccharomyces cerevisiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[56]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..