Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5′ and 3′ UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.

[1]  M. Rosbash,et al.  Number and distribution of polyadenylated RNA sequences in yeast , 1977, Cell.

[2]  G. Gerard,et al.  Second-strand cDNA synthesis with E. coli DNA polymerase I and RNase H: the fate of information at the mRNA 5' terminus and the effect of E. coli DNA ligase. , 1988, Nucleic acids research.

[3]  J. Boeke,et al.  Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR‐mediated gene disruption and other applications , 1998, Yeast.

[4]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[5]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[6]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[7]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[8]  N. Friedman,et al.  Single-Nucleosome Mapping of Histone Modifications in S. cerevisiae , 2005, PLoS biology.

[9]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[10]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[11]  Wolfgang Huber,et al.  A high-resolution map of transcription in the yeast genome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Hattori,et al.  A large-scale full-length cDNA analysis to explore the budding yeast transcriptome , 2006, Proceedings of the National Academy of Sciences.

[13]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[14]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[15]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[16]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[17]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[18]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[19]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[20]  Weidong Tian,et al.  Isoform discovery by targeted cloning, 'deep-well' pooling and parallel sequencing , 2008, Nature Methods.

[21]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[22]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.