SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq

Predicting the structure of genes from RNA-Seq data remains a significant challenge in bioinformatics. Although the amount of data available for analysis is growing at an accelerating rate, the capability to leverage these data to construct complete gene models remains elusive. In addition, the tools that predict novel transcripts exhibit poor accuracy. We present a novel approach to predicting splice graphs from RNA-Seq data that uses patterns of acceptor and donor sites to recognize when novel exons can be predicted unequivocally. This simple approach achieves much higher precision and higher recall than methods like Cufflinks or IsoLasso when predicting novel exons from real and simulated data. The ambiguities that arise from RNA-Seq data can preclude making decisive predictions, so we use a realignment procedure that can predict additional novel exons while maintaining high precision. We show that these accurate splice graph predictions provide a suitable basis for making accurate transcript predictions using tools such as IsoLasso and PSGInfer. Using both real and simulated data, we show that this integrated method predicts transcripts with higher recall and precision than using these other tools alone, and in comparison to Cufflinks. SpliceGrapherXT is available from the SpliceGrapher web page at http://SpliceGrapher.sf.net.

[1]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[2]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[3]  Tao Jiang,et al.  Inference of Isoforms from Short Sequence Reads , 2010, RECOMB.

[4]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[5]  Hamidreza Chitsaz,et al.  SEQuel: improving the accuracy of genome assemblies , 2012, Bioinform..

[6]  Colin N. Dewey,et al.  Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs , 2013, Bioinform..

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Nilgun Donmez,et al.  Hapsembler: An Assembler for Highly Polymorphic Genomes , 2011, RECOMB.

[9]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[10]  Peer Bork,et al.  Sircah: a tool for the detection and visualization of alternative transcripts , 2008, Bioinform..

[11]  Arun K. Ramani,et al.  Genome-wide analysis of alternative splicing in Caenorhabditis elegans. , 2011, Genome research.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Auinash Kalsotra,et al.  Functional consequences of developmentally regulated alternative splicing , 2011, Nature Reviews Genetics.

[14]  Colin N. Dewey,et al.  Learning Probabilistic Splice Graphs from RNA-Seq data , 2010 .

[15]  Yamile Marquez,et al.  Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis , 2012, Genome research.

[16]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[17]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[18]  Donny D. Licatalosi,et al.  RNA processing and its regulation: global insights into biological networks , 2010, Nature Reviews Genetics.

[19]  Dumitru Brinza,et al.  An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads , 2012, BCB.

[20]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[21]  Isaac Y. Ho,et al.  Meraculous: De Novo Genome Assembly with Short Paired-End Reads , 2011, PloS one.

[22]  Tao Jiang,et al.  Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads , 2012, Bioinform..

[23]  A. Ben-Hur,et al.  METHOD Open Access , 2014 .

[24]  Juliane D. Klein,et al.  LOCAS – A Low Coverage Assembly Tool for Resequencing Projects , 2011, PloS one.

[25]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[26]  Xiang-Dong Fu,et al.  Nuclear matrix factor hnRNP U/SAF-A exerts a global control of alternative splicing by regulating U2 snRNP maturation. , 2012, Molecular cell.

[27]  B. Harr,et al.  Genome‐wide analysis of alternative splicing evolution among Mus subspecies , 2010, Molecular ecology.

[28]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[29]  Vipin T. Sreedharan,et al.  RNA‐Seq Read Alignments with PALMapper , 2010, Current protocols in bioinformatics.

[30]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[31]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[32]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[33]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[34]  Henry D. Priest,et al.  Genome-wide mapping of alternative splicing in Arabidopsis thaliana. , 2010, Genome research.

[35]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[36]  Phillip A Sharp,et al.  The Centrality of RNA , 2009, Cell.

[37]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[38]  Orion J. Buske,et al.  iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data , 2013, Genome research.