RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

BackgroundEvolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris.ResultsTo produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study.ConclusionsWe established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

[1]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[2]  T. Fujisawa Hydra Peptide Project 1993–2007 , 2008, Development, growth & differentiation.

[3]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[4]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[5]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[6]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[7]  Nicholas H. Putnam,et al.  Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization , 2007, Science.

[8]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[9]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[10]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[11]  B. Swalla,et al.  Deciphering deuterostome phylogeny: molecular, morphological and palaeontological perspectives , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[13]  N. Kyrpides,et al.  Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.

[14]  Jade Buchanan-Carter,et al.  Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx , 2009, BMC Genomics.

[15]  R. D. Campbell,et al.  Phylogeny and biogeography of Hydra (Cnidaria: Hydridae) using mitochondrial and nuclear DNA sequences. , 2010, Molecular phylogenetics and evolution.

[16]  D. Hayward,et al.  New tricks with old genes: the genetic bases of novel cnidarian traits. , 2010, Trends in genetics : TIG.

[17]  Corinne Da Silva,et al.  Phylogenomics Revives Traditional Views on Deep Animal Relationships , 2009, Current Biology.

[18]  A. Fujiyama,et al.  Using the Acropora digitifera genome to understand coral responses to environmental change , 2011, Nature.

[19]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[20]  D. Hayward,et al.  Whole Transcriptome Analysis of the Coral Acropora millepora Reveals Complex Responses to CO2‐driven Acidification during the Initiation of Calcification , 2012, Molecular ecology.

[21]  Inna Dubchak,et al.  The genome portal of the Department of Energy Joint Genome Institute: 2014 updates , 2013, Nucleic Acids Res..

[22]  Kamran Shalchian-Tabrizi,et al.  Multigene Phylogeny of Choanozoa and the Origin of Animals , 2008, PloS one.

[23]  Stephen M. Mount,et al.  Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis , 2006, BMC Genomics.

[24]  T. Wetter,et al.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. , 2004, Genome research.

[25]  Ulrich C. Klostermeier,et al.  Molecular signatures of the three stem cell lineages in hydra and the emergence of stem cell function at the base of multicellularity. , 2012, Molecular biology and evolution.

[26]  B. Galliot Hydra, a fruitful model system for 270 years. , 2012, The International journal of developmental biology.

[27]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[28]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[29]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[30]  P. Wincker,et al.  Convergent origins and rapid evolution of spliced leader trans-splicing in metazoa: insights from the ctenophora and hydrozoa. , 2010, RNA.

[31]  T. Gojobori,et al.  The evolutionary emergence of cell type-specific genes inferred from the gene expression analysis of Hydra , 2007, Proceedings of the National Academy of Sciences.

[32]  L. Keller,et al.  The genome of the fire ant Solenopsis invicta , 2011, Proceedings of the National Academy of Sciences.

[33]  T. Bosch,et al.  Compagen, a comparative genomics platform for early branching metazoan animals, reveals early origins of genes regulating stem‐cell differentiation , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[34]  Gabriel Moreno-Hagelsieb,et al.  Choosing BLAST options for better detection of orthologs as reciprocal best hits , 2008, Bioinform..

[35]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[36]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[37]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[38]  N. A. Stover,et al.  Trans-spliced leader addition to mRNAs in a cnidarian , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. Cotton,et al.  The Ediacaran emergence of bilaterians: congruence between the genetic and the geological fossil records , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[40]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[41]  E. Punch,et al.  Pseudogenes: pseudo-functional or key regulators in health and disease? , 2011, RNA.

[42]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[43]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[44]  Uri Alon,et al.  The genetic code is nearly optimal for allowing additional information within protein-coding sequences. , 2007, Genome research.

[45]  Michael S. Barker,et al.  EvoPipes.net: Bioinformatic Tools for Ecological and Evolutionary Genomics , 2010, Evolutionary bioinformatics online.

[46]  Benjamin M. Wheeler,et al.  The dynamic genome of Hydra , 2010, Nature.

[47]  H. Bode,et al.  The transcriptome of the colonial marine hydroid Hydractinia echinata , 2010, The FEBS journal.

[48]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[49]  Carsten O. Daub,et al.  TagDust—a program to eliminate artifacts from next generation sequencing data , 2009, Bioinform..

[50]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[51]  V. Laudet,et al.  The orphan COUP-TF nuclear receptors are markers for neurogenesis from cnidarians to vertebrates. , 2004, Developmental biology.

[52]  Martin Kircher,et al.  Improved base calling for the Illumina Genome Analyzer using machine learning strategies , 2009, Genome Biology.

[53]  G. Biamonti,et al.  Cellular stress and RNA splicing. , 2009, Trends in biochemical sciences.

[54]  T. Fujisawa,et al.  Molecular phylogenetic study in genus Hydra. , 2010, Gene.