mBWA: A Massively Parallel Sequence Reads Aligner

Mapping sequenced reads to a reference genome, also known as sequence reads alignment, is central for sequence analysis. Emerging sequencing technologies such as next generation sequencing (NGS) lead to an explosion of sequencing data, which is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a multi-level parallelization strategy to speed up BWA, a widely used sequence alignment tool and developed our massively parallel sequence aligner: mBWA. mBWA contains two levels of parallelization: firstly, parallelization of data input/output (IO) and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by Intel Many Integrated Core (MIC) coprocessor technology. In this paper, we demonstrate that mBWA outperforms BWA by a combination of those techniques. To the best of our knowledge, mBWA is the first sequence alignment tool to run on Intel MIC and it can achieve more than 5-fold speedup over the original BWA while maintaining the alignment precision.

[1]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[2]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[3]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[4]  Ting Chen,et al.  PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds , 2009, Bioinform..

[5]  Miguel Rocha,et al.  10th International Conference on Practical Applications of Computational Biology & Bioinformatics , 2016 .

[6]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[7]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[8]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[11]  Paola Sebastiani,et al.  Comparing Bowtie and BWA to Align Short Reads from a RNA-Seq Experiment , 2012, PACBB.

[12]  Bin Ma,et al.  ZOOM! Zillions of oligos mapped , 2008, Bioinform..

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Luca Pireddu,et al.  MapReducing a genomic sequencing workflow , 2011, MapReduce '11.

[15]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[16]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[17]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[18]  Mark J. Clement,et al.  The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing , 2010, Bioinform..

[19]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[20]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[21]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[22]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[23]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.