Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki)

BackgroundThe production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment.ResultsWe used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki.ConclusionsSplice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.

[1]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  Brendan J. Frey,et al.  Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data , 2012, BMC Bioinformatics.

[4]  Li Yang,et al.  Conservation of an RNA regulatory map between Drosophila and mammals. , 2011, Genome research.

[5]  G. Tear,et al.  The Drosophila reticulon, Rtnl-1, has multiple differentially expressed isoforms that are associated with a sub-compartment of the endoplasmic reticulum , 2006, Cellular and Molecular Life Sciences CMLS.

[6]  R. Sachidanandam,et al.  Comprehensive splice-site analysis using comparative genomics , 2006, Nucleic acids research.

[7]  Marie-Laure Samson,et al.  found in neurons, a third member of the Drosophila elav gene family, encodes a neuronal protein and interacts with elav , 2003, Mechanisms of Development.

[8]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[9]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[10]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[11]  Trudy F C Mackay,et al.  Dynamic Genetic Interactions Determine Odor-Guided Behavior in Drosophila melanogaster , 2006, Genetics.

[12]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[13]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[14]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[15]  Douglas L. Black,et al.  Neuronal regulation of alternative pre-mRNA splicing , 2007, Nature Reviews Neuroscience.

[16]  BMC Bioinformatics , 2005 .

[17]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[18]  B. S. Baker,et al.  The control of alternative splicing at genes regulating sexual differentiation in D. melanogaster , 1988, Cell.

[19]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[20]  D. Black Mechanisms of alternative pre-messenger RNA splicing. , 2003, Annual review of biochemistry.

[21]  T. Mackay,et al.  Mutations in many genes affect aggressive behavior in Drosophila melanogaster , 2009, BMC Biology.

[22]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[23]  Paul C. Leyland,et al.  FlyBase: improvements to the bibliography , 2012, Nucleic Acids Res..

[24]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[25]  L. Feuk,et al.  Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain , 2011, Nature Structural &Molecular Biology.

[26]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[27]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[28]  Trudy F C Mackay,et al.  Neurogenetic networks for startle-induced locomotion in Drosophila melanogaster , 2008, Proceedings of the National Academy of Sciences.

[29]  Jamal Tazi,et al.  Regulated functional alternative splicing in Drosophila , 2011, Nucleic acids research.

[30]  Wei Li,et al.  A Statistical Method for the Detection of Alternative Splicing Using RNA-Seq , 2010, PloS one.

[31]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[32]  Frauke Meyer,et al.  Drosophila multiplexin (Dmp) modulates motor axon pathfinding accuracy , 2009, Development, growth & differentiation.

[33]  T. Mackay,et al.  Quantitative Genomics of Aggressive Behavior in Drosophila melanogaster , 2006, PLoS genetics.

[34]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[35]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[36]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[37]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[38]  Sylvain Foissac,et al.  A General Definition and Nomenclature for Alternative Splicing Events , 2008, PLoS Comput. Biol..

[39]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[40]  D. Zanini,et al.  Deletion of the Drosophila neuronal gene found in neurons disrupts brain anatomy and male courtship , 2012, Genes, brain, and behavior.

[41]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[42]  Kurt Hornik,et al.  The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd , 2006 .

[43]  Sherif Abou Elela,et al.  Cancer-associated regulation of alternative splicing , 2009, Nature Structural &Molecular Biology.

[44]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[45]  J. Venables Aberrant and Alternative Splicing in Cancer , 2004, Cancer Research.

[46]  Stephen M. Mount,et al.  Evolutionary dynamics of U12-type spliceosomal introns , 2010, BMC Evolutionary Biology.

[47]  Cahir J. O'Kane,et al.  Reticulon-like-1, the Drosophila orthologue of the Hereditary Spastic Paraplegia gene reticulon 2, is required for organization of endoplasmic reticulum and of distal motor axons , 2012, Human molecular genetics.

[48]  Roderic Guigó,et al.  Intron-centric estimation of alternative splicing from RNA-seq data , 2012, Bioinform..

[49]  Brian Oliver,et al.  A survey of ovary-, testis-, and soma-biased gene expression in Drosophila melanogaster adults , 2004, Genome Biology.

[50]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[51]  Gunnar Rätsch,et al.  Accurate splice site prediction using support vector machines , 2007, BMC Bioinformatics.

[52]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[53]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[54]  A. Quattrone,et al.  Defining a neuron: neuronal ELAV proteins , 2007, Cellular and Molecular Life Sciences.

[55]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.