PennDiff: detecting differential alternative splicing and transcription by RNA sequencing

Motivation: Alternative splicing and alternative transcription are a major mechanism for generating transcriptome diversity. Differential alternative splicing and transcription (DAST), which describe different usage of transcript isoforms across different conditions, can complement differential expression in characterizing gene regulation. However, the analysis of DAST is challenging because only a small fraction of RNA‐seq reads is informative for isoforms. Several methods have been developed to detect exon‐based and gene‐based DAST, but they suffer from power loss for genes with many isoforms. Results: We present PennDiff, a novel statistical method that makes use of information on gene structures and pre‐estimated isoform relative abundances, to detect DAST from RNA‐seq data. PennDiff has several advantages. First, grouping exons avoids multiple testing for ‘exons’ originated from the same isoform(s). Second, it utilizes all available reads in exon‐inclusion level estimation, which is different from methods that only use junction reads. Third, collapsing isoforms sharing the same alternative exons reduces the impact of isoform expression estimation uncertainty. PennDiff is able to detect DAST at both exon and gene levels, thus offering more flexibility than existing methods. Simulations and analysis of a real RNA‐seq dataset indicate that PennDiff has well‐controlled type I error rate, and is more powerful than existing methods including DEXSeq, rMATS, Cuffdiff, IUTA and SplicingCompass. As the popularity of RNA‐seq continues to grow, we expect PennDiff to be useful for diverse transcriptomics studies. Availability and implementation: PennDiff source code and user guide is freely available for download at https://github.com/tigerhu15/PennDiff. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Hooper,et al.  A survey of software for genome-wide discovery of differential splicing in RNA-Seq data , 2014, Human Genomics.

[2]  Hui Jiang,et al.  rSeqDiff: Detecting Differential Isoform Expression from RNA-Seq Data Using Hierarchical Likelihood Ratio Test , 2013, PloS one.

[3]  Mingyao Li,et al.  Joint Regression Analysis of Correlated Data Using Gaussian Copulas , 2009, Biometrics.

[4]  Juw Won Park,et al.  MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data , 2012, Nucleic acids research.

[5]  Jinze Liu,et al.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq , 2012 .

[6]  M. Swanson,et al.  RNA mis-splicing in disease , 2015, Nature Reviews Genetics.

[7]  Hongzhe Li,et al.  A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies. , 2012, Biostatistics.

[8]  Mingyao Li,et al.  PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution , 2013, Nucleic acids research.

[9]  Hakon Hakonarson,et al.  Comprehensive analysis of gene expression in human retina and supporting tissues , 2014, Human molecular genetics.

[10]  Roland Eils,et al.  SplicingCompass: differential splicing detection using RNA-Seq data , 2013, Bioinform..

[11]  David M Umbach,et al.  IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data , 2014, BMC Genomics.

[12]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[13]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[14]  Ying Liu,et al.  Functional analysis and transcriptomic profiling of iPSC-derived macrophages and their application in modeling Mendelian disease. , 2015, Circulation research.

[15]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[16]  Xuegong Zhang,et al.  Identifying differentially spliced genes from two groups of RNA-seq samples. , 2013, Gene.

[17]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[18]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[19]  Hyunsoo Kim,et al.  the transcriptome diversity of cerebellar development Alternative transcription exceeds alternative splicing in generating Material Supplemental , 2011 .

[20]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[21]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[22]  Roderic Guigó,et al.  Identification of genetic variants associated with alternative splicing using sQTLseekeR , 2014, Nature Communications.

[23]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[24]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[25]  Piero Carninci,et al.  The devil in the details of RNA-seq , 2014, Nature Biotechnology.

[26]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[27]  Joonhee Han,et al.  Pre-mRNA splicing: where and when in the nucleus. , 2011, Trends in cell biology.