5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation.

Complete genome annotation relies on precise identification of transcription units bounded by a transcription initiation site (TIS) and a polyadenylation site (PAS). To facilitate this process, we developed a set of two complementary methods, 5' Long serial analysis of gene expression (LS) and 3'LS. These analyses are based on the original SAGE and LS methods coupled with full-length cDNA cloning, and enable the high-throughput extraction of the first and the last 20 bp of each transcript. We demonstrate that the mapping of 5'LS and 3'LS tags to the genome allows the localization of TIS and PAS. By using 537 tag pairs mapping to the region of known genes, we confirmed that >90% of the tag pairs appropriately assigned to the first and last exons. Moreover, by using tag sequences as primers for RT-PCRs, we were able to recover putative full-length transcripts in 81% of the attempts. This large-scale generation of transcript terminal tags is at least 20-40 times more efficient than full-length cDNA cloning and sequencing in the identification of complete transcription units. The apparent precision and deep coverage makes 5'LS and 3'LS an advanced approach for genome annotation through whole-transcriptome characterization.

[1]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Kuersten,et al.  The power of the 3′ UTR: translational control and development , 2003, Nature Reviews Genetics.

[3]  Kamel Jabbari,et al.  Compositional Features of Eukaryotic Genomes for Checking Predicted Genes , 2003, Briefings Bioinform..

[4]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[5]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[7]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[8]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[9]  Ewan Birney,et al.  Genome annotation techniques: new approaches and challenges. , 2002, Drug discovery today.

[10]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[11]  C. Gissi,et al.  Untranslated regions of mRNAs , 2002, Genome Biology.

[12]  J. Kawai,et al.  Removal of polyA tails from full-length cDNA libraries for high-efficiency sequencing. , 2001, BioTechniques.

[13]  A. Reymond,et al.  From PREDs and open reading frames to cDNA isolation: revisiting the human chromosome 21 transcription map. , 2001, Genomics.

[14]  C. Gissi,et al.  Structural and functional features of eukaryotic mRNA untranslated regions. , 2001, Gene.

[15]  J. Pelletier,et al.  Full-length cDNAs: more than just reaching the ends. , 2001, Physiological genomics.

[16]  M. Bihoreau,et al.  A high-resolution consensus linkage map of the rat, integrating radiation hybrid and genetic maps. , 2001, Genomics.

[17]  Bernhard Korn,et al.  Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs , 2001 .

[18]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[19]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[20]  Antoine H. C. van Kampen,et al.  USAGE: a web-based approach towards the analysis of SAGE data , 2000, Bioinform..

[21]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[22]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[23]  W. Schmidt,et al.  CapSelect: a highly sensitive method for 5' CAP-dependent enrichment of full-length cDNA in PCR-mediated analysis of mRNAs. , 1999, Nucleic acids research.

[24]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning. , 1999, Methods in enzymology.

[25]  K. Boon,et al.  UTF1, a novel transcriptional coactivator expressed in pluripotent embryonic stem cells and extra‐embryonic cells , 1998, The EMBO journal.

[26]  B. Preston,et al.  Marked infidelity of human immunodeficiency virus type 1 reverse transcriptase at RNA and DNA template ends. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[27]  L. Gudas,et al.  Specific expression of a retinoic acid-regulated, zinc-finger gene, Rex-1, in preimplantation embryos, trophoblast and spermatocytes. , 1991, Development.

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  Peter W. J. Rigby,et al.  A POU-domain transcription factor in early stem cells and germ cells of the mammalian embryo , 1990, Nature.

[30]  M. Monk,et al.  HPRT-deficient (Lesch–Nyhan) mouse embryos derived from germline colonization by cultured cells , 1987, Nature.