ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events

BackgroundWhile the reconstruction of transcripts from a sample of RNA-Seq data is a computationally expensive and complicated task, the detection of splicing events from RNA-Seq data and a gene annotation is computationally feasible. This latter task, which is adequate for many transcriptome analyses, is usually achieved by aligning the reads to a reference genome, followed by comparing the alignments with a gene annotation, often implicitly represented by a graph: the splicing graph.ResultsWe present ASGAL (Alternative Splicing Graph ALigner): a tool for mapping RNA-Seq data to the splicing graph, with the specific goal of detecting novel splicing events, involving either annotated or unannotated splice sites. ASGAL takes as input the annotated transcripts of a gene and a RNA-Seq sample, and computes (1) the spliced alignments of each read in input, and (2) a list of novel events with respect to the gene annotation.ConclusionsAn experimental analysis shows that ASGAL allows to enrich the annotation with novel alternative splicing events even when genes in an experiment express at most one isoform. Compared with other tools which use the spliced alignment of reads against a reference genome for differential analysis, ASGAL better predicts events that use splice sites which are novel with respect to a splicing graph, showing a higher accuracy. To the best of our knowledge, ASGAL is the first tool that detects novel alternative splicing events by directly aligning reads to a splicing graph.AvailabilitySource code, documentation, and data are available for download at http://asgal.algolab.eu.

[1]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[2]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[3]  Haixu Tang,et al.  Splicing graphs and EST assembly problem , 2002, ISMB.

[4]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[5]  Marie-France Sagot,et al.  A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs , 2015, Algorithms for Molecular Biology.

[6]  Sylvain Foissac,et al.  A General Definition and Nomenclature for Alternative Splicing Events , 2008, PLoS Comput. Biol..

[7]  Chris Thachuk Indexing hypertext , 2013, J. Discrete Algorithms.

[8]  Dong Kyue Kim,et al.  String Matching in Hypertext , 1995, CPM.

[9]  Gonzalo Navarro,et al.  Improved approximate pattern matching on hypertext , 1998, Theor. Comput. Sci..

[10]  David A. Knowles,et al.  Annotation-free quantification of RNA splicing using LeafCutter , 2017, Nature Genetics.

[11]  Sylvain Foissac,et al.  ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets , 2007, Nucleic Acids Res..

[12]  Paola Bonizzoni,et al.  Mapping RNA-seq Data to a Transcript Graph via Approximate Pattern Matching to a Hypertext , 2017, AlCoB.

[13]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[14]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[15]  William Jones,et al.  Sequence variation aware genome references and read mapping with the variation graph toolkit , 2017, bioRxiv.

[16]  Udi Manber,et al.  APPROXIMATE STRING MATCHING WITH ARBITRARY COSTS FOR TEXT AND HYPERTEXT , 1993 .

[17]  J. Harrow,et al.  Systematic evaluation of spliced alignment programs for RNA-seq data , 2013, Nature Methods.

[18]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[19]  Anil Wipat,et al.  Human Tra2 proteins jointly control a CHEK1 splicing switch among alternative and constitutive target exons , 2014, Nature Communications.

[20]  Miha Skalic,et al.  SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions , 2016, Genome Biology.

[21]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[22]  Tianwei Yu,et al.  xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data , 2013, BMC Bioinformatics.

[23]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[24]  Tatsuya Akutsu A Linear Time Pattern Matching Algorithm Between a String and a Tree , 1993, CPM.

[25]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[26]  Moshe Lewenstein,et al.  Pattern Matching in Hypertext , 1997, J. Algorithms.

[27]  Juan González-Vallinas,et al.  A new view of transcriptome complexity and regulation through the lens of local splicing variations , 2016, eLife.

[28]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[29]  Alexandru I. Tomescu,et al.  A novel min-cost flow method for estimating transcript expression with RNA-Seq , 2013, BMC Bioinformatics.

[30]  Gunnar Rätsch,et al.  SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data , 2016, bioRxiv.

[31]  Paola Bonizzoni,et al.  Modeling Alternative Splicing Variants from RNA-Seq Data with Isoform Graphs , 2014, J. Comput. Biol..

[32]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[33]  Enno Ohlebusch,et al.  Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes , 2010, SPIRE.

[34]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[35]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .