Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation

MOTIVATION The whole genomes submitted to GenBank contain valuable information about the function of genes as well as the upstream sequences and whole cell expression provides valuable information on gene regulation. To utilize these large amounts of data for a biological understanding of the regulation of gene expression, new automatic methods for pattern finding are needed. RESULTS Two word-analysis algorithms for automatic discovery of regulatory sequence elements have been developed. We show that sequence patterns correlated to whole cell expression data can be found using Kolmogorov-Smirnov tests on the raw data, thereby eliminating the need for clustering co-regulated genes. Regulatory elements have also been identified by systematic calculations of the significance of correlations between words found in the functional annotation of genes and DNA words occurring in their promoter regions. Application of these algorithms to the Saccharomyces cerevisiae genome and publicly available DNA array data sets revealed a highly conserved 9-mer occurring in the upstream regions of genes coding for proteasomal subunits. Several other putative and known regulatory elements were also found. AVAILABILITY Upon request.

[1]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[2]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[3]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[4]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[5]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[6]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[7]  S. Iida,et al.  Amplified restriction fragment length polymorphism-based mRNA fingerprinting using a single restriction enzyme that recognizes a 4-bp sequence. , 1997, Biochemical and biophysical research communications.

[8]  André Goffeau,et al.  The yeast genome directory. , 1997, Nature.

[9]  I. Dawes,et al.  Regulation of gene expression during meiosis in Saccharomyces cerevisiae: SPR3 is controlled by both ABFI and a new sporulation control element , 1997, Molecular and cellular biology.

[10]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Q. Ju,et al.  A model for transcription termination by RNA polymerase I , 1994, Cell.

[12]  G. Adam,et al.  A Saccharomyces cerevisiae UAS element controlled by protein kinase A activates transcription in response to a variety of stress conditions. , 1993, The EMBO journal.

[13]  R. Reeder,et al.  The REB1 site is an essential component of a terminator for RNA polymerase I in Saccharomyces cerevisiae , 1993, Molecular and cellular biology.

[14]  J. Remacle,et al.  A REB1-binding site is required for GCN4-independent ILV1 basal level transcription and can be functionally replaced by an ABF1-binding site , 1992, Molecular and cellular biology.

[15]  K. O’Connell,et al.  Possible cross-regulation of phosphate and sulfate metabolism in Saccharomyces cerevisiae. , 1992, Genetics.

[16]  T. Cooper,et al.  The yeast UME6 gene product is required for transcriptional repression mediated by the CAR1 URS1 repressor binding site. , 1992, Nucleic acids research.

[17]  K. Struhl,et al.  Mutations in the bZIP domain of yeast GCN4 that alter DNA-binding specificity. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  R. Storms,et al.  Characterization of a short, cis-acting DNA sequence which conveys cell cycle stage-dependent transcription in Saccharomyces cerevisiae , 1991, Molecular and cellular biology.

[19]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[20]  J. Buhler,et al.  Contacts between the factor TUF and RPG sequences. , 1990, The Journal of biological chemistry.

[21]  B. L. Shea A remark on algorithm AS 152: Cumulative hypergeometric probabilities , 1989 .

[22]  J. Thompson,et al.  Structure and expression of the Saccharomyces cerevisiae CRY1 gene: a highly conserved ribosomal protein gene , 1987, Molecular and cellular biology.

[23]  W. H. Mager,et al.  Specific binding of TUF factor to upstream activation sites of yeast ribosomal protein genes. , 1987, The EMBO journal.

[24]  Paul J. Hagerman,et al.  Sequence-directed curvature of DNA , 1986, Nature.

[25]  R. Lund Algorithm AS 152: Cumulative Hypergeometric Probabilities , 1980 .

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.