SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints

Splicing event identification is one of the most important issues in the comprehensive analysis of transcription profile. Recent development of next-generation sequencing technology has generated an extensive profile of alternative splicing. However, while many of these splicing events are between exons that are relatively close on genome sequences, reads generated by RNA-Seq are not limited to alternative splicing between close exons but occur in virtually all splicing events. In this work, a novel method, SAW, was proposed for the identification of all splicing events based on short reads from RNA-Seq. It was observed that short reads not in known gene models are actually absent words from known gene sequences. An efficient method to filter and cluster these short reads by fingerprint fragments of splicing events without aligning short reads to genome sequences was developed. Additionally, the possible splicing sites were also determined without alignment against genome sequences. A consensus sequence was then generated for each short read cluster, which was then aligned to the genome sequences. Results demonstrated that this method could identify more than 90% of the known splicing events with a very low false discovery rate, as well as accurately identify, a number of novel splicing events between distant exons.

[1]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[2]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[3]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[4]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[5]  Jun Wang,et al.  A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data , 2008, BMC Bioinformatics.

[6]  Alan L Rockwood,et al.  Proteomic identification of oncogenic chromosomal translocation partners encoding chimeric anaplastic lymphoma kinase fusion proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Black Mechanisms of alternative pre-messenger RNA splicing. , 2003, Annual review of biochemistry.

[8]  Armando J. Pinho,et al.  On finding minimal absent words , 2009, BMC Bioinformatics.

[9]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.

[10]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[11]  F. Clark,et al.  Understanding alternative splicing: towards a cellular code , 2005, Nature Reviews Molecular Cell Biology.

[12]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[13]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[14]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[15]  B. Blencowe Alternative Splicing: New Insights from Global Analyses , 2006, Cell.

[16]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[17]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.