Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.

[1]  M. King,et al.  Evolution at two levels in humans and chimpanzees. , 1975, Science.

[2]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[3]  Y. Suzuki,et al.  Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA library. , 1997, Gene.

[4]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[5]  A. J. Lopez,et al.  Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. , 1998, Annual review of genetics.

[6]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[7]  D. Tautz Evolution of transcriptional regulation. , 2000, Current opinion in genetics & development.

[8]  P. Pelicci,et al.  Evolution of Shc functions from nematode to human. , 2000, Current opinion in genetics & development.

[9]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[10]  D. Black Protein Diversity from Alternative Splicing A Challenge for Bioinformatics and Post-Genome Biology , 2000, Cell.

[11]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[12]  T. Tsunoda,et al.  Identification and characterization of the potential promoter regions of 1031 kinds of human genes. , 2001, Genome research.

[13]  A Suyama,et al.  Diverse transcriptional initiation revealed by fine, large‐scale mapping of mRNA start sites , 2001, EMBO reports.

[14]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[15]  Mark S. Boguski,et al.  Comparative genomics: The mouse that roared , 2002, Nature.

[16]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[17]  P. Pelicci,et al.  The p66Shc Longevity Gene Is Silenced through Epigenetic Modifications of an Alternative Promoter* , 2002, The Journal of Biological Chemistry.

[18]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[19]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[20]  Sumio Sugano,et al.  Construction of a full-length enriched and a 5'-end enriched cDNA library using the oligo-capping method. , 2003, Methods in molecular biology.

[21]  Yi Xing,et al.  ASAP: the Alternative Splicing Annotation Project , 2003, Nucleic Acids Res..

[22]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Dixie L Mager,et al.  Complex controls: the role of alternative promoters in mammalian genomes. , 2003, Trends in genetics : TIG.

[24]  Terry Gaasterland,et al.  Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. , 2003, Genome research.

[25]  Yoshihide Hayashizaki,et al.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. , 2003, Genome research.

[26]  H. Schiöth,et al.  The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. , 2003, Molecular pharmacology.

[27]  M. Maio,et al.  Epigenetic targets for immune intervention in human malignancies , 2003, Oncogene.

[28]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[29]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD: the impact of in silico primer extension , 2004, Nucleic Acids Res..

[30]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[31]  Edgar Wingender,et al.  TRANSFAC, TRANSPATH and CYTOMER as starting points for an ontology of regulatory networks. , 2004, In silico biology.

[32]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[33]  M. Kalējs,et al.  Cancer/testis antigens and gametogenesis: a review and "brain-storming" session , 2005, Cancer Cell International.

[34]  Kenta Nakai,et al.  BTSS, DataBase of Transcriptional Start Sites: progress report 2004 , 2004, Nucleic Acids Res..

[35]  Fred Winston,et al.  Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene , 2004, Nature.

[36]  N. Nomura,et al.  Complete sequencing and characterization of 21,243 full-length human cDNAs , 2004, Nature Genetics.

[37]  Sumio Sugano,et al.  5′-end SAGE for the analysis of transcriptional start sites , 2004, Nature Biotechnology.

[38]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[39]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[40]  K. Nakai,et al.  Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. , 2005, Gene.

[41]  Rotem Sorek,et al.  Naturally occurring antisense: transcriptional leakage or real overlap? , 2005, Genome research.

[42]  Graziano Pesole,et al.  UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2004, Nucleic Acids Res..

[43]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[44]  David Haussler,et al.  The UCSC Proteome Browser , 2004, Nucleic Acids Res..

[45]  Philipp Kapranov,et al.  Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. , 2005, Genome research.

[46]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[47]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..