Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

MOTIVATION The introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. Several approaches were developed during the last years allowing for a fast alignment of short sequences to a given template. Methods available to date use heuristic techniques to gain a speedup of the alignments, thereby missing possible alignment positions. Furthermore, most approaches return only one best hit for every query sequence, thus losing the potentially valuable information of alternative alignment positions with identical scores. RESULTS We developed SARUMAN (Semiglobal Alignment of short Reads Using CUDA and NeedleMAN-Wunsch), a mapping approach that returns all possible alignment positions of a read in a reference sequence under a given error threshold, together with one optimal alignment for each of these positions. Alignments are computed in parallel on graphics hardware, facilitating an considerable speedup of this normally time-consuming step. Combining our filter algorithm with CUDA-accelerated alignments, we were able to align reads to microbial genomes in time comparable or even faster than all published approaches, while still providing an exact, complete and optimal result. At the same time, SARUMAN runs on every standard Linux PC with a CUDA-compatible graphics accelerator. AVAILABILITY http://www.cebitec.uni-bielefeld.de/brf/saruman/saruman.html.

[1]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[2]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[3]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[4]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[5]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[6]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  R. Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[10]  Isidore Rigoutsos,et al.  FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Esko Ukkonen,et al.  Two Algorithms for Approximate String Matching in Static Texts , 1991, MFCS.

[12]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[13]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[14]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[15]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[16]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[17]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.