Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis

BackgroundUltra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step.ResultsIn this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses – alternative splicing and gene differential expression – without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations.ConclusionsThe transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.

[1]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[2]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[3]  T. Blauwkamp,et al.  Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events , 2015, Nature Biotechnology.

[4]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[5]  Michael B. Black,et al.  IVT-seq reveals extreme bias in RNA sequencing , 2014, Genome Biology.

[6]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[7]  Matthias Heinig,et al.  Alternative Splicing Signatures in RNA‐seq Data: Percent Spliced in (PSI) , 2015, Current protocols in human genetics.

[8]  Lior Pachter,et al.  Gene-level differential analysis at transcript-level resolution , 2017, Genome Biology.

[9]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[10]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[11]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[12]  Robert Patro,et al.  RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes , 2015, bioRxiv.

[13]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[14]  Juan González-Vallinas,et al.  A new view of transcriptome complexity and regulation through the lens of local splicing variations , 2016, eLife.

[15]  Héctor Corrada Bravo,et al.  Yanagi: Transcript Segment Library Construction for RNA-Seq Quantification , 2017, WABI.

[16]  Paola Bonizzoni,et al.  ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events , 2018, BMC Bioinformatics.

[17]  Geoffrey J. Barton,et al.  Identifying differential isoform abundance with RATs: a universal tool and a warning , 2017, bioRxiv.

[18]  R. Irizarry,et al.  Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[19]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[20]  Mark D. Robinson,et al.  Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage , 2016, Genome Biology.

[21]  Miha Skalic,et al.  SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions , 2016, Genome Biology.

[22]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[23]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[24]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[25]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[26]  Derek Y. Chiang,et al.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq , 2012, Nucleic acids research.

[27]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[28]  Alyssa C. Frazee,et al.  Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[29]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[30]  Julie A. Dickerson,et al.  Comparisons of computational methods for differential alternative splicing detection using RNA-seq in plant systems , 2014, BMC Bioinformatics.

[31]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[32]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[33]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[34]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[35]  Daisuke Hattori,et al.  Got diversity? Wiring the fly brain with Dscam. , 2006, Trends in biochemical sciences.

[36]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[37]  Robert J. Weatheritt,et al.  Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop. , 2018, Molecular cell.

[38]  Gael P. Alamancos,et al.  Leveraging transcript quantification for fast computation of alternative splicing profiles , 2014, bioRxiv.