A hypothesis driven approach to condition specific transcription factor binding site characterization in S.c.

We demonstrate a computational process by which transcription factor binding sites can be elucidated using genome-wide expression and binding profiles. The profiles direct us to the intergenic locations likely to contain the promoter regions for a given factor. These sequences are multiply and locally aligned to give an anchor motif from which further characterization can take place. We present bases for and assumptions about the variability within these motifs which give rise to potentially more accurate motifs, capture complex binding sites built upon the basis motif, and eliminate the constraints of the currently employed promoter searching protocols. We also present a measure of motif quality based on the occurrence of the putative motifs in regions observed to contain the binding sites. The assumptions, motif generation, quality assessment and comparison allow the user as much control as their a priori knowledge allows.

[1]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[2]  Thomas Werner,et al.  Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity , 1999, Bioinform..

[3]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  Dan S. Prestridge,et al.  SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements , 1991, Comput. Appl. Biosci..

[6]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[7]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[8]  T. Werner,et al.  A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. , 1997, Journal of molecular biology.

[9]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[10]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[11]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[12]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[13]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[14]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[15]  Michael Carey,et al.  DNA recognition by GAL4: structure of a protein-DNA complex , 1992, Nature.

[16]  John C. Schug,et al.  Tess: transcription element search software on the www , 1977 .

[17]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[18]  D. S. Prestridge Predicting Pol II promoter sequences using transcription factor binding sites. , 1995, Journal of molecular biology.

[19]  D. Brenner,et al.  Techniques to measure nucleic acid-protein binding and specificity. Nuclear extract preparations, DNase I footprinting, and mobility shift assays. , 2001, Methods in molecular biology.