Gene Expression Analysis Using RNA-Seq from Organisms Lacking Substantial Genomic Resources

Development of massively parallel “next generation” sequencing technology (NGS) has dramatically revolutionized biological studies. Among the many applications of NGS, RNASeq is one of the most important uses of this technology. RNA-Seq enables investigators to accurately probe the current state of a transcriptome and assess many biologically important issues, such as; gene expression levels, differential splicing events, and allele-specific gene expression. Compared with previous technologies (e.g., microarrays, etc.) NGS has the clear advantage of not being limited to experimental systems having well characterized genomes or transcript sequence libraries. This positions RNA-seq approaches as important and versatile techniques for experimental systems and species where specific genetic information may be limited or altogether lacking. A major goal of most transcriptomic studies is the identification and characterization of all transcripts within a developmental stage or specific tissue. NGS techniques have made the massive amount of data required to carry out such studies both inexpensive and available to an unprecedented extent. Clever computer algorithms have made the assembly of these massive data sets the work of one or two people with reasonably powerful workstations or a moderate analytical server. Once a reference transcriptome has been assembled, analyses can be carried out that involve several steps, such as; mapping short sequence reads to transcriptome, quantifying the abundance of genes or gene sets, and comparing differential expression patterns among all samples. Herein we outline the processes from obtaining raw short read data to advanced comparative gene expression analysis and we review bioinformatic programs currently available, such as Tophat, Cufflinks, DESeq, that are specifically designed to address each of the above steps. We will discuss both accuracy and ease of use of these tools by biologists beginning to pursue these types of analyses. In addition to individual programs, we will also discuss integration of multiple programs into pipelines for more rapid and complete expression analyses. Overall, the future applications of RNA-Seq will open new avenues for transcriptome analyses of less well-studied and/or wild caught species that could not have previously been approached. This will yield a wealth of new comparative data highlighting the many ways plants and animals have developed to survive in this rapidly changing environment.

[1]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[2]  Peter J. Tonellato,et al.  Cloud computing for comparative genomics , 2010, BMC Bioinformatics.

[3]  Carsten O. Daub,et al.  Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite , 2009, Bioinform..

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Cole Trapnell,et al.  TopHat: discovering splice junctions with RNA-Seq , 2009, Bioinform..

[6]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[7]  T. Pastinen Genome-wide allele-specific analysis: insights into regulatory variation , 2010, Nature Reviews Genetics.

[8]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[9]  M. Nowrousian Next-Generation Sequencing Techniques for Eukaryotic Microorganisms: Sequencing-Based Solutions to Biological Problems , 2010, Eukaryotic Cell.

[10]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[11]  I. Gimenez-Conti,et al.  Genetic Analysis of Neoplasia Induced by N-Nitroso-N-methylurea in Xiphophorus Hybrid Fish , 2001, Marine Biotechnology.

[12]  R. Walter,et al.  Xiphophorus interspecies hybrids as genetic models of induced neoplasia. , 2001, ILAR journal.

[13]  Leandro Pardo,et al.  THE JENSEN-SHANNON DIVERGENCE , 1997 .

[14]  Anton Nekrutenko,et al.  Galaxy CloudMan: delivering cloud compute clusters , 2010, BMC Bioinformatics.

[15]  C. Nusbaum,et al.  Key considerations for measuring allelic expression on a genomic scale using high‐throughput sequencing , 2010, Molecular ecology.

[16]  A. Amores,et al.  Identification of transcriptome SNPs between Xiphophorus lines and species for assessing allele specific gene expression within F₁ interspecies hybrids. , 2012, Comparative biochemistry and physiology. Toxicology & pharmacology : CBP.

[17]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[18]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[19]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[20]  U. Gowik,et al.  What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research. , 2010, Plant biology.

[21]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[22]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[23]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[24]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[25]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[26]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[27]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[28]  Thomas J. Hudson,et al.  Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression , 2008, PLoS genetics.

[29]  A. Ciccodicola,et al.  Uncovering the Complexity of Transcriptomes with RNA-Seq , 2010, Journal of biomedicine & biotechnology.

[30]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[31]  M. Schartl,et al.  From Mendelian to molecular genetics: the Xiphophorus melanoma model. , 2006, Trends in genetics : TIG.

[32]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[33]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[34]  Weiguo Liu,et al.  Quality-score guided error correction for short-read sequencing data using CUDA , 2010, ICCS.

[35]  Fernando Guirado,et al.  Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud , 2010, Bioinform..

[36]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[37]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[38]  P. Khaitovich,et al.  BMC Genomics BioMed Central Methodology article Estimating accuracy of RNA-Seq and microarrays with proteomics , 2022 .

[39]  I. Good Normal Recurring Decimals , 1946 .

[40]  Matthew E. Ritchie,et al.  High-throughput analysis of candidate imprinted genes and allele-specific gene expression in the human term placenta , 2010, BMC Genetics.

[41]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[42]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[43]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[44]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[45]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[46]  Mona Singh,et al.  Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays , 2009, BMC Genomics.

[47]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[48]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[49]  S. Kazianis,et al.  The genus Xiphophorus in Mexico and central america. , 2006, Zebrafish.

[50]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .