Defining the transcriptome of nine vertebrate genomes using RNAseq

To the biomedical research community at large, understanding the content of the genome, particularly the genes, is critically important. The coding and non-coding transcriptomes of both human and mouse have been exquisitely refined through the use of massive amounts of species-specific cDNA sequence data, manual curation and recently mRNA sequencing. However, all other vertebrate genome annotations are largely based on homology to human and mouse genes, due to the lack of cDNA sequence available from other vertebrate species. With the advent of the lower-cost RNAseq technology, the time is right to invest in RNA sequencing from other vertebrate species to improve genome annotations and understand both the coding and non-coding transcriptomes. The Broad Institute currently has laboratory and computational methods in use and under further development to generate high quality RNAseq data for genome annotation. We propose to use <0.02% of NHGRI's annual sequencing capacity at the Broad to improve nine important genome stickleback). At this time, a small investment in RNA sequencing would provide a huge improvement in vertebrate genome annotation, enabling better research across a multitude of biomedical disciplines.