Identifying differential isoform abundance with RATs: a universal tool and a warning

Motivation The biological importance of changes in gene and transcript expression is well recognised and is reflected by the wide variety of tools available to characterise these changes. Regulation via Differential Transcript Usage (DTU) is emerging as an important phenomenon. Several tools exist for the detection of DTU from read alignment or assembly data, but options for detection of DTU from alignment-free quantifications are limited. Results We present an R package named RATs – (Relative Abundance of Transcripts) – that identifies DTU transcriptome-wide directly from transcript abundance estimations. RATs is agnostic to quantification methods and exploits bootstrapped quantifications, if available, to inform the significance of detected DTU events. RATs contextualises the DTU results and shows good False Discovery performance (median FDR ≤0.05) at all replication levels. We applied RATs to a human RNA-seq dataset associated with idiopathic pulmonary fibrosis with three DTU events validated by qRT-PCR. RATs found all three genes exhibited statistically significant changes in isoform proportions based on Ensembl v60 annotations, but the DTU for two were not reliably reproduced across bootstrapped quantifications. RATs also identified 500 novel DTU events that are enriched for eleven GO terms related to regulation of the response to stimulus, regulation of immune system processes, and symbiosis/parasitism. Repeating this analysis with the Ensembl v87 annotation showed the isoform abundance profiles of two of the three validated DTU genes changed radically. RATs identified 414 novel DTU events that are enriched for five GO terms, none of which are in common with those previously identified. Only 141 of the DTU evens are common between the two analyses, and only 8 are among the 248 reported by the original study. Furthermore, the original qRT-PCR probes no longer match uniquely to their original transcripts, calling into question the interpretation of these data. We suggest parallel full-length isoform sequencing, annotation pre-filtering and sequencing of the transcripts captured by qRT-PCR primers as possible ways to improve the validation of RNA-seq results in future experiments. Availability The package is available through Github at https://github.com/bartongroup/Rats.

[1]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[2]  Nan Deng,et al.  Detecting Splicing Variants in Idiopathic Pulmonary Fibrosis from Non-Differentially Expressed Genes , 2013, PloS one.

[3]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[4]  D. Schwartz,et al.  The genetic and environmental causes of pulmonary fibrosis. , 2012, Proceedings of the American Thoracic Society.

[5]  David A. Knowles,et al.  LeafCutter: annotation-free quantification of RNA splicing , 2016, bioRxiv.

[6]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[7]  J. Christopher Love,et al.  Comparative genomics and transcriptomics of Pichia pastoris , 2016, BMC Genomics.

[8]  Sokal Rr,et al.  Biometry: the principles and practice of statistics in biological research 2nd edition. , 1981 .

[9]  Eduardo Eyras,et al.  Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer , 2015, Nucleic acids research.

[10]  Lior Pachter,et al.  Differential analysis of RNA-seq incorporating quantification uncertainty , 2016, Nature Methods.

[11]  Geoffrey J. Barton,et al.  How well do RNA-Seq differential gene expression tools perform in a eukaryote with a complex transcriptome? , 2016, bioRxiv.

[12]  Lior Pachter,et al.  Differential analysis of RNA-Seq incorporating quantification uncertainty , 2016 .

[13]  Geet Duggal,et al.  Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference , 2015, bioRxiv.

[14]  Gerald Hysenaj,et al.  Fast and accurate differential splicing analysis across multiple conditions with replicates , 2016 .

[15]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[18]  D. H. Kim,et al.  Identification of tissue-enriched novel transcripts and novel exons in mice , 2014, BMC Genomics.

[19]  Geoffrey J. Barton,et al.  How well do RNA-Seq differential gene expression tools perform in higher eukaryotes? , 2016 .

[20]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[21]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[22]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[23]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[24]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[25]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[26]  Gael P. Alamancos,et al.  Leveraging transcript quantification for fast computation of alternative splicing profiles , 2014, bioRxiv.

[27]  J. Harrow,et al.  High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing , 2017, Nature Genetics.

[28]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[29]  N. Deng,et al.  Isoform-level microRNA-155 target prediction using RNA-seq , 2011, Nucleic acids research.

[30]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[31]  Botond Sipos,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[32]  D. Rio,et al.  Mechanisms and Regulation of Alternative Pre-mRNA Splicing. , 2015, Annual review of biochemistry.

[33]  S. Stamm,et al.  Function of Alternative Splicing , 2004 .

[34]  Alvis Brazma,et al.  Identification, annotation and visualisation of extreme changes in splicing from RNA-seq experiments with SwitchSeq , 2014, bioRxiv.

[35]  C. Perou,et al.  Identification of mRNA isoform switching in breast cancer , 2016, BMC Genomics.

[36]  L. Stein,et al.  Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome , 2012, Cancers.

[37]  P. Duque,et al.  On the physiological significance of alternative splicing events in higher plants , 2013, Protoplasma.

[38]  Barry L. Stoddard,et al.  Natural and engineered nicking endonucleases—from cleavage mechanism to engineering of strand-specificity , 2010, Nucleic Acids Res..

[39]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[40]  Lincoln Stein,et al.  Reactome pathway analysis: a high-performance in-memory approach , 2017, BMC Bioinformatics.

[41]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[42]  P. Collas,et al.  Differential transcript isoform usage pre- and post-zygotic genome activation in zebrafish , 2013, BMC Genomics.

[43]  Midori A. Harris,et al.  The Gene Ontology project , 2005 .

[44]  Juan L Trincado,et al.  SUPPA2 provides fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions , 2017, bioRxiv.

[45]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[46]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[47]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[48]  May D. Wang,et al.  Assessing the impact of human genome annotation choice on RNA-seq expression estimates , 2013, BMC Bioinformatics.

[49]  Daniel R. Garalde,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[50]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[51]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[52]  Mark D. Robinson,et al.  Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage , 2016, Genome Biology.