Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length

BackgroundRNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths.ResultsThe distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads.ConclusionsGT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.

[1]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[2]  Kwong-Sak Leung,et al.  Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length , 2010, BIBM.

[3]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[4]  Kwong-Sak Leung,et al.  ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping , 2010, Bioinform..

[5]  Paul Levi,et al.  GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[8]  Thangavel Alphonse Thanaraj,et al.  ASD: a bioinformatics resource on alternative splicing , 2005, Nucleic Acids Res..

[9]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[10]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[11]  L. Feuk,et al.  Global and unbiased detection of splice junctions from RNA-seq data , 2010, Genome Biology.

[12]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[13]  C. Burge,et al.  A computational analysis of sequence features involved in recognition of short introns , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[15]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[16]  Weng-Keen Wong,et al.  Gene expression Advance Access publication April 21, 2010 Supersplat—spliced RNA-seq alignment , 2009 .

[17]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[18]  Tomás Vinar,et al.  A Better Method for Length Distribution Modeling in HMMs and Its Application to Gene Finding , 2002, CPM.

[19]  F. Denoeud,et al.  Annotating genomes with massive-scale RNA sequencing , 2008, Genome Biology.

[20]  J. Roach,et al.  Modeling the feasibility of whole genome shotgun sequencing using a pairwise end strategy. , 2000, Genomics.

[21]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[22]  Wei Li,et al.  A Statistical Method for the Detection of Alternative Splicing Using RNA-Seq , 2010, PloS one.