Discriminative Learning of Composite Transcriptional Regulatory Modules

A central goal of molecular biology is to uncover transcription regulation mechanisms that govern gene expression. Transcription factors play an important role in those mechanisms, as they affect the transcription rates of genes. Often, such regulatory circuits involve not only one transcription factor but rather several factors that act in concert to modulate the transcription of genes. The recent advances in high-throughput assays, such as microarray experiments and Chromatin Immunoprecipitation, allow us to infer groups of genes that are co-expressed or coregulated. The challenge is to use this wealth of information to gain insights about transcriptional regulation. In this dissertation, we present a procedure for locating regulatory complexes in promoter regions. A regulatory complex represents the binding sites of a pair of transcription factors that act in cooperation. Our procedure takes a discriminative approach, searching for regulatory complexes that are overabundant in the promoter regions of the target group of co-expressed genes and are infrequent in the control group of genes outside the target group. By doing this, we filter out phenomena that are shared among both groups, ideally leaving us with the core motifs. We demonstrate the applicability of our method for finding regulatory complexes in a genome-wide analysis of the yeast genome.

[1]  Roded Sharan,et al.  CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments , 2003, ISMB.

[2]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[3]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[4]  Gill Bejerano Efficient exact value computation and applications to biosequence analysis , 2003, RECOMB '03.

[5]  R. Derynck,et al.  Smad-dependent and Smad-independent pathways in TGF-beta family signalling. , 2003, Nature.

[6]  M. Gerstein,et al.  Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. , 2002, Genes & development.

[7]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[8]  Z. Weng,et al.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. , 2002, Nucleic acids research.

[9]  Nir Friedman,et al.  From promoter sequence to expression: a probabilistic framework , 2002, RECOMB '02.

[10]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  William H. Press,et al.  Numerical recipes in C , 2002 .

[12]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[13]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[14]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[15]  Nir Friedman,et al.  A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites , 2001, WABI.

[16]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[17]  Robert J. White,et al.  Gene Transcription: Mechanisms and Control , 2001 .

[18]  J. Liu,et al.  Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. , 2001, Nucleic acids research.

[19]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[20]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[21]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[22]  G. Church,et al.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. , 2000, Genome research.

[23]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[24]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[25]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[26]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[27]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[28]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[29]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[30]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[31]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[32]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[33]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[34]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.