Data Analysis Pipeline for RNA‐seq Experiments: From Differential Expression to Cryptic Splicing

RNA sequencing (RNA‐seq) is a high‐throughput technology that provides unique insights into the transcriptome. It has a wide variety of applications in quantifying genes/isoforms and in detecting non‐coding RNA, alternative splicing, and splice junctions. It is extremely important to comprehend the entire transcriptome for a thorough understanding of the cellular system. Several RNA‐seq analysis pipelines have been proposed to date. However, no single analysis pipeline can capture dynamics of the entire transcriptome. Here, we compile and present a robust and commonly used analytical pipeline covering the entire spectrum of transcriptome analysis, including quality checks, alignment of reads, differential gene/transcript expression analysis, discovery of cryptic splicing events, and visualization. Challenges, critical parameters, and possible downstream functional analysis pipelines associated with each step are highlighted and discussed. This unit provides a comprehensive understanding of state‐of‐the‐art RNA‐seq analysis pipeline and a greater understanding of the transcriptome. © 2017 by John Wiley & Sons, Inc.

[1]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[2]  Connie R. Jimenez,et al.  On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics , 2010, Bioinform..

[3]  David M Umbach,et al.  IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data , 2014, BMC Genomics.

[4]  Zhandong Liu,et al.  Comprehensive evaluation of RNA-seq quantification methods for linearity , 2016, BMC Bioinformatics.

[5]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[6]  M. Green Pre-mRNA splicing. , 1986, Annual review of genetics.

[7]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[8]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[9]  C. Mungall,et al.  Gene Ontology Consortium : going forward The Gene Ontology , 2015 .

[10]  Hari Krishna Yalamanchili,et al.  SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples , 2014, Nucleic acids research.

[11]  P. Wong,et al.  TDP-43 repression of nonconserved cryptic exons is compromised in ALS-FTD , 2015, Science.

[12]  A. Newman Pre-mRNA splicing. , 1994, Current Opinion in Genetics and Development.

[13]  Zhandong Liu,et al.  Transcriptional Regulation by ATOH1 and its Target SPDEF in the Intestine , 2016, Cellular and molecular gastroenterology and hepatology.

[14]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[15]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[16]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[17]  P. Sharp,et al.  Splicing of messenger RNA precursors. , 1987, Annual Review of Biochemistry.

[18]  Ching-Seng Ang,et al.  FunRich: An open access standalone functional enrichment and interaction network analysis tool , 2015, Proteomics.

[19]  T. Tatusova,et al.  Cryptic splice sites and split genes , 2011, Nucleic acids research.

[20]  H. Zoghbi,et al.  Reversal of phenotypes in MECP2 duplication mice using genetic rescue or antisense oligos , 2015, Nature.

[21]  Yan Zhou,et al.  Evolution of peroxisome proliferator-activated receptor gamma alternative splicing. , 2010, Frontiers in bioscience.

[22]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[23]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[24]  Joshua J. White,et al.  Extensive cryptic splicing upon loss of RBM17 and TDP43 in neurodegeneration models. , 2016, Human molecular genetics.

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[26]  Bin Yan,et al.  DDGni: Dynamic delay gene-network inference from high-temporal data using gapped local alignment , 2014, Bioinform..

[27]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[28]  M. Garcia-Blanco,et al.  Alternative splicing in disease and therapy , 2004, Nature Biotechnology.

[29]  Guey-Shin Wang,et al.  Splicing in disease: disruption of the splicing code and the decoding machinery , 2007, Nature Reviews Genetics.

[30]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[31]  Junwen Wang,et al.  A novel neural response algorithm for protein function prediction , 2012, BMC Systems Biology.

[32]  G. Omenn,et al.  Proteomic characterization of novel alternative splice variant proteins in human epidermal growth factor receptor 2/neu-induced breast cancers. , 2010, Cancer research.