Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++

Biological Sequence Comparison is an important operation in Bioinformatics that is often used to relate organisms. Smith and Waterman proposed an exact algorithm (SW) that compares two sequences in quadratic time and space. Due to high computing and memory requirements, SW is usually executed on HPC platforms such as multicore clusters and CellBEs. Since HPC architectures exhibit very different hardware characteristics, porting an application between them is an error-prone time-consuming task. BSP++ is an implementation of BSP that aims to reduce the effort to write parallel code. In this paper, we propose and evaluate a parallel BSP++ strategy to execute SW in multiple platforms like MPI, OpenMP, MPI/OpenMP, CellBE and MPI/CellBE. The results obtained with real DNA sequences show that the performance of our versions is comparable to the ones in the literature, evidencing the appropriateness and flexibility of our approach.

[1]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[2]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[3]  Chee Keong Kwoh,et al.  High performance protein sequence database scanning on the Cell Broadband Engine , 2009, HiPC 2009.

[4]  Wu-chun Feng,et al.  Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[5]  Bertil Schmidt,et al.  Computing large-scale alignments on a multi-cluster , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[6]  Wu-chun Feng,et al.  Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine , 2008, CF '08.

[7]  Mahdi Noorian,et al.  Performance enhancement of smith-waterman algorithm using hybrid model: Comparing the MPI and hybrid programming paradigm on SMP clusters , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Azzedine Boukerche,et al.  Exact pairwise alignment of megabase genome biological sequences using a novel z-align parallel strategy , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Luca P. Carloni,et al.  Recursion-driven parallel code generation for multi-core platforms , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[10]  Srinivas Aluru,et al.  Parallel Genomic Alignments on the Cell Broadband Engine , 2009, IEEE Transactions on Parallel and Distributed Systems.

[11]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[12]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[13]  Ali Akoglu,et al.  Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[14]  Gregory Francis Pfister,et al.  In search of clusters: the coming battle in lowly parallel computing , 1995 .

[15]  Laxmikant V. Kalé,et al.  Towards a framework for abstracting accelerators in parallel applications: experience with cell , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[17]  Azzedine Boukerche,et al.  An adaptive multi-policy grid service for biological sequence comparison , 2010, J. Parallel Distributed Comput..

[18]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[19]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[20]  Daniel Etiemble,et al.  Hybrid bulk synchronous parallelism library for clustered smp architectures , 2010, HLPP '10.

[21]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[22]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[23]  Rob H. Bisseling,et al.  Scientific Computing on Bulk Synchronous Parallel Architectures , 1994, IFIP Congress.