Models for prediction and recognition of eukaryotic promoters

The enormous impact of various genome sequencing projects is bringing the importance of computer-assisted nucleotide sequence analysis to the attention of a constantly increasing number of scientists. Gene prediction is undoubtedly leading the list of important tasks in this context. It is also generally accepted that this task consists of the recognition of the exon/intron structure of the coding region as well as prediction of the corresponding promoter. However, there is much less agreement about the exact sequences that should be called a promoter. In general, the promoter is an integral part of the gene and often makes sense only in the context of its own gene, especially if important parts of the regulation are determined outside of the promoter (for example, by an intron enhancer; Stamatoyannopoulos et al. 1997). The function of a promoter is to mediate and control initiation of transcription of that part of a gene that is located immediately downstream of the promoter (38). This can be achieved either in an unregulated permanent manner (constitutive transcription) or in a highly regulated fashion by which transcription is subjected to the control of various extracellular and intracellular signals (regulated transcription). The DNA region required to fulfill this function can be determined by assays for promoter function in a heterologous context. Unfortunately, this simple scheme becomes blurred in the case of highly regulated promoters. Often complex regulation involves many more features than just the promoter; for example, enhancers, locus control regions (LCRs), and/or scaffold/matrix attachment regions (S/MARs, reviewed in Boulikas 1996). If any of these units, which are functionally completely different from promoters, happens to be located adjacent to the promoter, delineation of the promoter becomes difficult. This may be one of the reasons why promoter prediction programs almost exclusively focus on proximal promoter regions or even just on the core promoter. Therefore, I will refer to a promoter mainly as the region that is necessary to achieve transcriptional initiation, although this region may not be sufficient to determine the complete regulation of a gene. What is the basic aim of computer-assisted promoter recognition? The most obvious answer is location of an important part of the regulatory region of a gene. However, promoter prediction can also be very useful in the context of gene prediction. The promoter by definition marks the beginning of the first exon of a gene, which is often difficult to predict, especially if the first exon is not translated or very short (sometimes even more than one promoter/first exon exists). The promoter regions also contain information complementary to the exons and introns because transcriptional regulation—which can play an important part in gene function— cannot be deduced from the predicted amino acid sequence. The promoter, once understood, may even yield first clues towards the function of a completely anonymous protein, for example, if the promoter is known to be tissue or cell specific. Prediction of the functionality of a promoter would also be welcome for gene therapy approaches to improve expression of newly created vector constructs.

[1]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[2]  Jean-Michel Claverie,et al.  Assessing the biological significance of primary structure consensus patterns using sequence databanks. I. Heat-shock and glucocorticoid control elements in eukaryotic promoters , 1985, Comput. Appl. Biosci..

[3]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4]  H. Stunnenberg,et al.  Repression of transcription mediated at a thyroid hormone response element by the v-erb-A oncogene product , 1989, Nature.

[5]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[6]  R. Conaway,et al.  Initiation of eukaryotic messenger RNA synthesis. , 1991, The Journal of biological chemistry.

[7]  S. McKnight,et al.  Anatomy of an enhancer , 1992 .

[8]  David Ghosh,et al.  Status of the transcription factors database (TFD) , 1993, Nucleic Acids Res..

[9]  J. Alwine,et al.  Transcriptional activation by simian virus 40 large T antigen: requirements for simple promoter structures containing either TATA or initiator elements with variable upstream factor binding sites , 1993, Journal of virology.

[10]  Y. Capetanaki,et al.  An E box in the desmin promoter cooperates with the E box and MEF‐2 sites of a distal enhancer to direct muscle‐specific transcription. , 1994, The EMBO journal.

[11]  Akinori Yonezawa,et al.  RNA secondary structure prediction using highly parallel computers , 1995, Comput. Appl. Biosci..

[12]  E. Wingender,et al.  A compilation of composite regulatory elements affecting gene transcription in vertebrates. , 1995, Nucleic acids research.

[13]  D. Reinberg,et al.  Common themes in assembly and function of eukaryotic transcription complexes. , 1995, Annual review of biochemistry.

[14]  Gary D. Stormo,et al.  MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices , 1995, Comput. Appl. Biosci..

[15]  J. Manley,et al.  Cooperation between core promoter elements influences transcriptional activity in vivo. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[17]  A. Ruddell Transcription regulatory elements of the avian retroviral long terminal repeat. , 1995, Virology.

[18]  K Rippe,et al.  Action at a distance: DNA-looping and initiation of transcription. , 1995, Trends in biochemical sciences.

[19]  G. Stein,et al.  Contributions of nuclear architecture to transcriptional control. , 1995, International review of cytology.

[20]  J. Roberts,et al.  Corticotropin-releasing hormone stimulates proopiomelanocortin transcription by cFos-dependent and -independent pathways: characterization of an AP1 site in exon 1. , 1995, Molecular endocrinology.

[21]  D J Shapiro,et al.  Intrinsically Bent DNA in a Eukaryotic Transcription Factor Recognition Sequence Potentiates Transcription Activation (*) , 1995, The Journal of Biological Chemistry.

[22]  M. Busslinger,et al.  Transcriptional activation of the fra-1 gene by AP-1 is mediated by regulatory sequences in the first intron , 1995, Molecular and cellular biology.

[23]  D A Nielsen,et al.  SSCP primer design based on single-strand DNA structure predicted by a DNA folding program. , 1995, Nucleic acids research.

[24]  Thomas Werner,et al.  GenomeInspector: a new approach to detect correlation patterns of elements on genomic sequences , 1996, Comput. Appl. Biosci..

[25]  A. Usheva,et al.  YY1 transcriptional initiator: protein interactions and association with a DNA site containing unpaired strands. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  K Frech,et al.  Common modular structure of lentivirus LTRs. , 1996, Virology.

[27]  M. Shago,et al.  Isolation of a novel retinoic acid-responsive gene by selection of genomic fragments derived from CpG-island-enriched DNA , 1996, Molecular and cellular biology.

[28]  A. Roy,et al.  Core promoters and transcriptional control. , 1996, Trends in genetics : TIG.

[29]  J. Fickett Coordinate positioning of MEF2 and myogenin binding sites. , 1996, Gene.

[30]  T. Boulikas,et al.  Common structural features of replication origins in all life forms , 1996, Journal of cellular biochemistry.

[31]  E. Davidson,et al.  Modular cis-regulatory organization of developmentally expressed genes: two genes transcribed territorially in the sea urchin embryo, and additional examples. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[32]  L. Freedman,et al.  AP-1 regulation of the rat bone sialoprotein gene transcription is mediated through a TPA response element within a glucocorticoid response unit in the gene promoter. , 1996, Matrix biology : journal of the International Society for Matrix Biology.

[33]  T. Werner,et al.  GenomeInspector: basic software tools for analysis of spatial correlations between genomic structures within megabase sequences. , 1996, Genomics.

[34]  R. Kraus,et al.  Experimentally determined weight matrix definitions of the initiator and TBP binding site elements of promoters. , 1996, Nucleic acids research.

[35]  K. Rhee,et al.  Glucocorticoid regulation of a transcription factor that binds an initiator-like element in the murine thymidine kinase (Tk-1) promoter. , 1996, Molecular endocrinology.

[36]  M. Montminy Transcriptional activation: Something new to hang your HAT on , 1997, Nature.

[37]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[38]  M. Garcia-Blanco,et al.  TAR RNA decoys inhibit tat-activated HIV-1 transcription after preinitiation complex formation. , 1997, Nucleic acids research.

[39]  K Frech,et al.  Software for the analysis of DNA sequence elements of transcription , 1997, Comput. Appl. Biosci..

[40]  P Cramer,et al.  Functional association between promoter structure and transcript alternative splicing. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[41]  V. Higgins,et al.  Tandemly Repeated 147 bp Elements Cause Structural and Functional Variation in Divergent MAL Promoters of Saccharomyces cerevisiae , 1997, Yeast.

[42]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[43]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. VII. The complete sequences of 100 new cDNA clones from brain which can code for large proteins in vitro. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[44]  Y. Murakami,et al.  Expression Profiles of Transcripts from 126 Open Reading Frames in the Entire Chromosome VI of Saccharomyces cerevisiae by Systematic Northern Analyses , 1997, Yeast.

[45]  K Frech,et al.  Specific modelling of regulatory units in DNA sequences. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[46]  F. Robert,et al.  RAP74 induces promoter contacts by RNA polymerase II upstream and downstream of a DNA bend centered on the TATA box. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[47]  A Renner,et al.  RNA structures and folding: from conventional to new issues in structure predictions. , 1997, Current opinion in structural biology.

[48]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[49]  R. Tjian,et al.  Mechanisms of transcriptional activation: differences and similarities between yeast, Drosophila, and man. , 1997, Current opinion in genetics & development.

[50]  C Gaspin,et al.  ESSA: an integrated and interactive computer tool for analysing RNA secondary structure. , 1997, Nucleic acids research.

[51]  E. Olson,et al.  Modular regulation of muscle gene transcription: a mechanism for muscle cell diversity. , 1997, Trends in genetics : TIG.

[52]  C. Glass,et al.  Nuclear integration of JAK/STAT and Ras/AP-1 signaling by CBP and p300. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[53]  T. Werner,et al.  Finding protein-binding sites in DNA sequences: the next generation. , 1997, Trends in biochemical sciences.

[54]  T. Werner,et al.  A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. , 1997, Journal of molecular biology.

[55]  Substitution of just five nucleotides at and around the transcription start site of rat β‐actin promoter is sufficient to render the resulting transcript a subject for translational control , 1997, FEBS letters.

[56]  J. Stamatoyannopoulos,et al.  Sheltering of gamma-globin expression from position effects requires both an upstream locus control region and a regulatory element 3' to the A gamma-globin gene , 1997, Molecular and cellular biology.

[57]  R. Zarnegar,et al.  A novel transcriptional regulatory region within the core promoter of the hepatocyte growth factor gene is responsible for its inducibility by cytokines via the C/EBP family of transcription factors , 1997, Molecular and cellular biology.

[58]  P. Bucher,et al.  Searching for regulatory elements in human noncoding sequences. , 1997, Current opinion in structural biology.

[59]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. IX. The complete sequences of 100 new cDNA clones from brain which can code for large proteins in vitro. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[60]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[61]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD , 1998, Nucleic Acids Res..

[62]  T. Heinemeyer,et al.  Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL , 1998, Nucleic Acids Res..

[63]  Thomas Werner,et al.  Muscle actin genes: A first step towards computational classification of tissue specific promoters , 1998, Silico Biol..

[64]  G. Lavorgna,et al.  Detection of potential target genes in silico? , 1998, Trends in Genetics.

[65]  C. Glass,et al.  Transcription factor-specific requirements for coactivators and their acetyltransferase functions. , 1998, Science.

[66]  G. Crabtree,et al.  Architectural DNA binding by a high-mobility-group/kinesin-like subunit in mammalian SWI/SNF-related complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.