CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

MOTIVATION New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. RESULTS We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm uses a quality-aware bounded search approach based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared with three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of single-nucleotide polymorphism calling for an Escherichia coli test dataset. AVAILABILITY http://cushaw.sourceforge.net

[1]  Yongchao Liu,et al.  DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI , 2011, BMC Bioinformatics.

[2]  Yongchao Liu,et al.  CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments , 2010, Bioinform..

[3]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[4]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[5]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[6]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[7]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[8]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[9]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[10]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[11]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[12]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[13]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[14]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[15]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[16]  Siu-Ming Yiu,et al.  Compressed indexing and local alignment of DNA , 2008, Bioinform..

[17]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[18]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[19]  Siu-Ming Yiu,et al.  A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays , 2002, Algorithmica.

[20]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[21]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[22]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[23]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  Toshiyuki Yamamoto,et al.  CONFLICT OF INTEREST: None declared. , 2013 .

[26]  J. Stoye,et al.  Exact and Complete Short Read Alignment to Microbial Genomes Using Gpu Programming , 2022 .

[27]  Claude-Alain H. Roten,et al.  Fast and accurate short read alignment with Burrows–Wheeler transform , 2009, Bioinform..

[28]  Meng He,et al.  Indexing Compressed Text , 2003 .

[29]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .