HSA: A Heuristic Splice Alignment Tool

BackgroundRNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons.MethodsA two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner - BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match.ResultsCompare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent.ConclusionsHSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.

[1]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[2]  Siu-Ming Yiu,et al.  High Throughput Short Read Alignment via Bi-directional BWT , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[3]  W. Wong,et al.  Modeling non-uniformity in short-read rates in RNA-Seq data , 2010, Genome Biology.

[4]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[5]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[6]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[7]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[8]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[9]  Siu-Ming Yiu,et al.  SOAPsplice: Genome-Wide ab initio Detection of Splice Junctions from RNA-Seq Data , 2011, Front. Gene..

[10]  Tobias Meilinger,et al.  Is the Map in Our Head Oriented North? , 2012, Psychological science.

[11]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[14]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[15]  A. Newman Pre-mRNA splicing. , 1994, Current Opinion in Genetics and Development.

[16]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[17]  M. A. Nelson,et al.  The Next Generation Becomes the Now Generation , 2010, PLoS genetics.

[18]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[19]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[20]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[21]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[22]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[23]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.