A fast read alignment method based on seed-and-vote for next generation sequencing

BackgroundThe next-generation of sequencing technologies, along with the development of bioinformatics, are generating a growing number of reads every day. For the convenience of further research, these reads should be aligned to the reference genome by read alignment tools. Despite the diversity of read alignment tools, most have no comprehensive advantage in both accuracy and speed. For example, BWA has comparatively high accuracy, but its speed leaves much to be desired, becoming a bottleneck while an increasing number of reads need to be aligned every day. We believe that the speed of read alignment tools still has huge room for improvement, while maintaining little to no loss in accuracy.ResultsHere we implement a new read alignment tool, Fast Seed-and-Vote Aligner (FSVA), which is based on seeding and voting. FSVA achieves a high accuracy close to BWA and simultaneously has a very high speed. It only requires ~10–15 CPU hours to run a whole genome read alignment, which is ~5–7 times faster than BWA.ConclusionsIn some cases, reads have to be aligned in a short time. Where requirement of accuracy is not very stringent, FSVA would be a promising option.FSVA is available at https://github.com/Topwood91/FSVA

[1]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[7]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[8]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[9]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[10]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[13]  Nesime Tatbul,et al.  Large-Scale DNA Sequence Analysis in the Cloud: A Stream-Based Approach , 2011, Euro-Par Workshops.

[14]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[15]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[16]  Yongchao Liu,et al.  CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing , 2014, IEEE Design & Test.