A code for transcription initiation in mammalian genomes.

Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling.

[1]  E. Ohtsuka,et al.  Comparison of substrate base sequences for RNA ligase reactions in the synthesis of a tetradecanucleotide corresponding to bases 21-34 of E. coli tRNAfMet 1. , 1980, Nucleic acids research.

[2]  R. Grosschedl,et al.  Identification of regulatory sequences in the prelude sequences of an H2A histone gene by the study of specific deletion mutants in vivo. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Y. Suzuki,et al.  In monkey COS cells only the TATA box and the cap site region are required for faithful and efficient initiation of the fibroin gene transcription. , 1984, Nucleic acids research.

[4]  The effect of changing the distance between the TATA-box and cap site by up to three base pairs on the selection of the transcriptional start site of a cloned eukaryotic gene in vitro and in vivo. , 1986, Nucleic acids research.

[5]  D. Baltimore,et al.  The “initiator” as a transcription control element , 1989, Cell.

[6]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Azizkhan,et al.  Transcriptional initiation is controlled by upstream GC-box interactions in a TATAA-less promoter , 1990, Molecular and cellular biology.

[8]  D. Baltimore,et al.  Transcriptional activation by Sp1 as directed through TATA or initiator: specific requirement for mammalian transcription factor IID. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  A. O'Shea-Greenfield,et al.  Roles of TATA and initiator elements in determining the start site location and direction of RNA polymerase II transcription. , 1992, Journal of Biological Chemistry.

[11]  L. Orgel,et al.  In vitro selection of optimal DNA substrates for T4 RNA ligase. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Mark Borodovsky,et al.  Deriving Non-homogeneous DNA Markov Chain Models by Cluster Analysis Algorithm Minimizing Multiple Alignment Entropy , 1994, Comput. Chem..

[13]  C. Lamb,et al.  TATA box and initiator functions in the accurate transcription of a plant minimal promoter in vitro. , 1995, The Plant cell.

[14]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[15]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[16]  Michael Hampsey,et al.  Molecular Genetics of the RNA Polymerase II General Transcriptional Machinery , 1998, Microbiology and Molecular Biology Reviews.

[17]  Walter L. Ruzzo,et al.  A Linear Time Algorithm for Finding All Maximal Scoring Subsequences , 1999, ISMB.

[18]  S Harbeck,et al.  Stochastic segment models of eukaryotic promoter regions. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[19]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[20]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[21]  A Suyama,et al.  Diverse transcriptional initiation revealed by fine, large‐scale mapping of mRNA start sites , 2001, EMBO reports.

[22]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[23]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[24]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[25]  M. Fagiolini,et al.  Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. , 2003, Genome research.

[26]  Thomas Werner,et al.  The State of the Art of Mammalian Promoter Recognition , 2003, Briefings Bioinform..

[27]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[28]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[30]  E. Grotewold,et al.  Genome wide analysis of Arabidopsis core promoters , 2005, BMC Genomics.

[31]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[32]  Kenta Nakai,et al.  BTSS, DataBase of Transcriptional Start Sites: progress report 2004 , 2004, Nucleic Acids Res..

[33]  Naum I. Gershenzon,et al.  Synergy of human Pol II core promoter elements revealed by statistical sequence analysis , 2005, Bioinform..

[34]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[35]  S. Henikoff,et al.  Genome-scale profiling of histone H3.3 replacement patterns , 2005, Nature Genetics.

[36]  R. Gangal,et al.  Human pol II promoter prediction: time series descriptors and machine learning , 2005, Nucleic acids research.

[37]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[38]  Uwe Ohler,et al.  Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment , 2006, Genome Biology.

[39]  A. Sandelin,et al.  Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters , 2006, Genome Biology.

[40]  Leah Barrera,et al.  The transcriptional regulatory code of eukaryotic cells--insights from genome-wide analysis of chromatin organization and transcription factor binding. , 2006, Current opinion in cell biology.

[41]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[42]  Christina A. Cuomo,et al.  Human chromosome 11 DNA sequence and analysis including novel gene identification , 2006, Nature.

[43]  Kenta Nakai,et al.  DBTSS: DataBase of Human Transcription Start Sites, progress report 2006 , 2005, Nucleic Acids Res..

[44]  R. Myers,et al.  Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. , 2005, Genome research.

[45]  J. Mellor,et al.  Dynamic nucleosomes and gene transcription. , 2006, Trends in genetics : TIG.

[46]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[47]  Jun Kawai,et al.  Dynamic usage of transcription start sites within core promoters , 2006, Genome Biology.

[48]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[49]  C. Chiang,et al.  The General Transcription Machinery and General Cofactors , 2006, Critical reviews in biochemistry and molecular biology.

[50]  Michael Q. Zhang,et al.  Boosting with stumps for predicting transcription start sites , 2007, Genome Biology.

[51]  Yoshihide Hayashizaki,et al.  Histone H3 acetylated at lysine 9 in promoter is associated with low nucleosome density in the vicinity of transcription start site in human cell , 2006, Chromosome Research.

[52]  K. Nakai,et al.  Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. , 2005, Genome research.

[53]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[54]  Ferenc Müller,et al.  New Problems in RNA Polymerase II Transcription Initiation: Matching the Diversity of Core Promoters with a Variety of Promoter Recognition Factors* , 2007, Journal of Biological Chemistry.

[55]  A. Akobeng,et al.  Understanding diagnostic tests 3: receiver operating characteristic curves , 2007, Acta paediatrica.

[56]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[57]  Boris Lenhard,et al.  Mammalian RNA polymerase II core promoters: insights from genome-wide studies , 2007, Nature Reviews Genetics.

[58]  Anason S. Halees,et al.  reveals dual regulatory roles of YY 1 Analysis of overrepresented motifs in human core promoters data , 2007 .