Molecular Profiling of RNA Tumors Using High-Throughput RNA Sequencing: From Raw Data to Systems Level Analyses.

RNAseq is a powerful technique enabling global profiles of transcriptomes in healthy and diseased states. In this chapter we review pipelines to analyze the data generated by sequencing RNA, from raw data to a system level analysis. We first give an overview of workflow to generate mapped reads from FASTQ files, including quality control of FASTQ, filtering and trimming of reads, and alignment of reads to a genome. Then, we compare and contrast three popular options to determine differentially expressed (DE) transcripts (The Tuxedo Pipeline, DESeq2, and Limma/voom). Finally, we examine four tool sets to extrapolate biological meaning from the list of DE genes (Genecards, The Human Protein Atlas, GSEA, and ToppGene). We emphasize the need to ask a concise scientific question and to clearly under stand the strengths and limitations of the methods.

[1]  David Gomez-Cabrero,et al.  Erratum to: A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[2]  Wolfgang Huber,et al.  Love MI, Huber W, Anders S.. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15: 550 , 2014 .

[3]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[4]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[5]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[6]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[7]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Zhifu Sun,et al.  A genetic cell context-dependent role for ZEB1 in lung cancer , 2016, Nature Communications.

[10]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[11]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[12]  Jennifer M. Bolin,et al.  Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA , 2011, Journal of visualized experiments : JoVE.

[13]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[14]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[15]  Carleton T. Garrett,et al.  Molecular Biology Basics in the “Omics” Era: Genes to Proteins , 2015 .

[16]  Somnath Datta,et al.  Statistical analysis of next generation sequencing date , 2014 .

[17]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[18]  B. Wold,et al.  Sequence census methods for functional genomics , 2008, Nature Methods.

[19]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[20]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[21]  W. Alkema,et al.  BioVenn – a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams , 2008, BMC Genomics.

[22]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[23]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[24]  Michael I. Love,et al.  Differential analysis of count data – the DESeq2 package , 2013 .

[25]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[26]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[27]  U. Alon An introduction to systems biology : design principles of biological circuits , 2019 .

[28]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[29]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[30]  C. Sim,et al.  Transcript Assembly and Quantification by RNA-Seq Reveals Significant Differences in Gene Expression and Genetic Variants in Mosquitoes of the Culex pipiens (Diptera: Culicidae) Complex , 2020, Journal of Medical Entomology.

[31]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[32]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[33]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[34]  Olha Buchel,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[35]  Tsippi Iny Stein,et al.  In-silico human genomics with GeneCards , 2011, Human Genomics.