Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity

The implementation of DNA alignment tools for Bioinformatics lead to face different problems that dip into performances. A single alignment take an amount of time that is not predictable and there are different factors that can affect performances, for example the length of sequences can determine the computational grain of the task and mismatches or insertion/deletion (indels) increase the amount of time needed to complete an alignment. Moreover, an alignment is a strong memory-bound problem because of the irregular memory access patterns and limitations in memory-bandwidth. Over the years, many alignment tools were implemented. A concrete example is Bowtie2, one of the fastest (concurrent, Pthread-based) and state of the art not GPU-based alignment tool. Bowtie2 exploits concurrency by instantiating a pool of threads, which have access to a global input dataset, share the reference genome and have access to different objects for collecting alignment results. In this paper a modified implementation of Bowtie2 is presented, in which the concurrency structure has been changed. The proposed implementation exploits the task-farm skeleton pattern implemented as a Master-Worker. The Master-Worker pattern permits to delegate only to the master thread dataset reading and to make private to each Worker data structures that are shared in the original version. Only the reference genome is left shared. As a further optimisation, the Master and each Worker were pinned on cores and the reference genome was allocated interleaved among memory nodes. The proposed implementation is able to gain up to 10 speedup points over the original implementation.

[1]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[2]  Massimo Torquati,et al.  Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[3]  Giovanni Manzini,et al.  An experimental study of an opportunistic index , 2001, SODA '01.

[4]  Jing Zhang,et al.  Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[5]  Yanghua Xiao,et al.  CGAP-Align: A High Performance DNA Short Read Alignment Tool , 2013, PloS one.

[6]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[7]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[8]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[9]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[10]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[11]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[12]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[13]  Marco Danelutto,et al.  FastFlow tutorial , 2012, ArXiv.

[14]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[15]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[16]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[17]  Peter Kilpatrick,et al.  An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[18]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[19]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[20]  Ping Liang,et al.  Faster Short DNA Sequence Alignment with Parallel BWA , 2011 .

[21]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[22]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[23]  Qutaibah M. Malluhi,et al.  Efficient parallel implementation of the SHRiMP sequence alignment tool using MapReduce , 2012 .

[24]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[25]  Massimo Torquati,et al.  On Designing Multicore-Aware Simulators for Biological Systems , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[26]  Peter Kilpatrick,et al.  Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[27]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[28]  Marco Danelutto,et al.  Skeleton-based parallel programming: Functional and parallel semantics in a single shot , 2007, Comput. Lang. Syst. Struct..

[29]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[30]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..