Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length

RNA sequencing (RNA-Seq) measures gene expression levels and permits splicing analysis. There are many aligners that provide ultra-fast mapping of millions of sequencing reads onto a reference genome. However, random assignment or removal of reads matching multiple positions along the reference genome could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and junction sites detection.

[1]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[2]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[3]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[4]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[5]  Weng-Keen Wong,et al.  Gene expression Advance Access publication April 21, 2010 Supersplat—spliced RNA-seq alignment , 2009 .

[6]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[7]  Thangavel Alphonse Thanaraj,et al.  ASD: a bioinformatics resource on alternative splicing , 2005, Nucleic Acids Res..

[8]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[9]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[10]  C. Burge,et al.  A computational analysis of sequence features involved in recognition of short introns , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  L. Feuk,et al.  Global and unbiased detection of splice junctions from RNA-seq data , 2010, Genome Biology.

[12]  Wei Li,et al.  A Statistical Method for the Detection of Alternative Splicing Using RNA-Seq , 2010, PloS one.

[13]  J. Roach,et al.  Modeling the feasibility of whole genome shotgun sequencing using a pairwise end strategy. , 2000, Genomics.

[14]  Tomás Vinar,et al.  A Better Method for Length Distribution Modeling in HMMs and Its Application to Gene Finding , 2002, CPM.

[15]  F. Denoeud,et al.  Annotating genomes with massive-scale RNA sequencing , 2008, Genome Biology.

[16]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[17]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[18]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[19]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[20]  Kwong-Sak Leung,et al.  ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping , 2010, Bioinform..

[21]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..