Extension of Partial Gene Transcripts by Iterative Mapping of RNA-Seq Raw Reads

Many non-model organisms lack reference genomes and the sequencing and de novo assembly of an organisms transcriptome is an affordable means by which to characterize the coding component of its genome. Despite the advances that have made this possible, assembling a transcriptome without a known reference usually results in a collection of full-length and partial gene transcripts. The downstream analysis of genes represented as partial transcripts then often requires further experimental work in the laboratory in order to obtain full- length sequences. We have explored whether partial transcripts, encoding genes of interest present in de novo assembled transcriptomes of a model and non-model insect species, could be further extended by iterative mapping against the raw transcriptome sequencing reads. Partial sequences encoding cytochrome P450s and carboxyl/cholinesterase were used in this analysis, because they are large multigene families and exhibit significant variation in expression. We present an effective method to improve the contiguity of partial transcripts in silico that, in the absence of a reference genome, may be a quick and cost-effective alternative to their extension by laboratory experimentation. Our approach resulted in the successful extension of incompletely assembled transcripts, often to full length. We experimentally validated these results in silico and using real-time PCR and sequencing.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  Shaoli Wang,et al.  Transcriptome analysis of host-associated differentiation in Bemisia tabaci (Hemiptera: Aleyrodidae) , 2014, Front. Physiol..

[3]  Sean R. Eddy,et al.  nhmmer: DNA homology search with profile HMMs , 2013, Bioinform..

[4]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[5]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[6]  Robert P. Davey,et al.  Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics , 2013, Front. Genet..

[7]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[8]  Kevin P. Johnson,et al.  aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data , 2015, BMC Bioinformatics.

[9]  Liqing Zhang,et al.  TransPS: A Transcriptome Post Scaffolding Method for Assembling High Quality Contigs. , 2014, Computational biology journal.

[10]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[11]  Ryan D. Morin,et al.  Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. , 2008, BioTechniques.

[12]  H. Robertson,et al.  Next-generation phylogenomics using a Target Restricted Assembly Method. , 2013, Molecular phylogenetics and evolution.

[13]  Stephen A. Smith,et al.  Optimizing de novo assembly of short-read RNA-seq data for phylogenomics , 2013, BMC Genomics.

[14]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[15]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[16]  Siu-Ming Yiu,et al.  IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels , 2013, Bioinform..

[17]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[18]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[19]  M. Berriman,et al.  Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps , 2010, Genome Biology.

[20]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[21]  J. Derisi,et al.  PRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data , 2013, G3: Genes, Genomes, Genetics.

[22]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[23]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[24]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[25]  N. Krishnan,et al.  Augmenting transcriptome assembly by combining de novo and genome-guided tools , 2013, PeerJ.

[26]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.