MUSIC: A hybrid computing environment for burrows-wheeler alignment for massive amount of short read sequence data

High-throughput DNA sequencers are becoming indispensible in our understanding of diseases at molecular level, in marker-assisted selection in agriculture and in microbial genetics research. These sequencing instruments produce enormous amount of data (often terabytes of raw data in a month) that requires efficient analysis, management and interpretation. The commonly used sequencing instrument today produces billions of short reads (upto 150 bases) from each run. The first step in the data analysis step is alignment of these short reads to the reference genome of choice. There are different open source algorithms available for sequence alignment to the reference genome. These tools normally have a high computational overhead, both in terms of number of processors and memory. Here, we propose a hybridcomputing environment called MUSIC (Mapping USIng hybrid Computing) for one of the most popular open source sequence alignment algorithm, BWA, using accelerators that show significant improvement in speed over the serial code.

[1]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[2]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[3]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[4]  Michael Q. Zhang,et al.  Updates to the RMAP short-read mapping software , 2009, Bioinform..

[5]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[6]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[7]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[8]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[9]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[10]  Baohong Zhang,et al.  High throughput sequencing technology reveals that the taxoid elicitor methyl jasmonate regulates microRNA expression in Chinese yew (Taxus chinensis). , 2009, Gene.

[11]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[12]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[13]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[14]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[15]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[16]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[17]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[18]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[19]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.