Characterization of Smith-Waterman sequence database search in X10

Productivity and performance are always viewed as two sides of parallel programming languages. X10 is a new object-oriented parallel language for both high-productivity and high-performance. To help the development of X10, we characterize the performance of X10 in bioinformatics using the fundamental application Smith-Waterman (SW) sequence database search. We implement the SW application in X10 on multi-core shared-memory architecture. Through comparing with three SW implementations in C++, we make following suggestions for X10 as well as its compiler. (1) X10 compiler should improve its array access implementation in kernel loop to avoid redundant check and inefficient offset computation. The array access of the latest version X10 is much slower than that of C++, which results in poor single-core performance of SW in X10. (2) X10 should support the utilization of SIMD instructions. With 128-bit SSE instructions, SW in X10 can achieve 8.7--17.7 fold speedup. Note that there are many applications in the world which can dramatically benefit from SIMD architectures on modern processors.

[1]  Sayantan Sur,et al.  A comparative study and empirical evaluation of global view High performance Linpack program in X10 , 2009, PGAS '09.

[2]  Vijay Saraswat,et al.  GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.

[3]  David Grove,et al.  X10 as a Parallel Language for Scientific Computation: Practice and Experience , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  Eric Rice,et al.  The UCSC Kestrel General Purpose Parallel Processor , 1999, PDPTA.

[5]  R. Barik Experiences with an SMP Implementation for X 10 based on the Java Concurrency Utilities ( Extended Abstract ) , 2008 .

[6]  Wu-chun Feng,et al.  Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine , 2008, CF '08.

[7]  Vijay Saraswat,et al.  GPU Programming in a High Level Language , 2011 .

[8]  Ying Peng,et al.  X10 implementation of parallel option pricing with BSDE method , 2011, X10 '11.

[9]  Chee Keong Kwoh,et al.  CBESW: Sequence Alignment on the Playstation 3 , 2008, BMC Bioinformatics.

[10]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[11]  Haibo Chen,et al.  Evaluating the Performance and Scalability of MapReduce Applications on X10 , 2011, APPT.

[12]  Guang R. Gao,et al.  A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison , 2000, Pacific Symposium on Biocomputing.

[13]  David A. Bader,et al.  A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[14]  Meng-Lai Yin,et al.  A parallel implementation of the Smith-Waterman algorithm for massive sequences searching , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Vivek Sarkar,et al.  An Experiment in Measuring the Productivity of Three Parallel Programming Languages , 2007 .

[16]  David Cunningham,et al.  A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.

[17]  Vivek Sarkar,et al.  Array optimizations for high productivity programming languages , 2009 .

[18]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[19]  Francisco José Esteban,et al.  Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture , 2011, Parallel Comput..

[20]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[21]  Yang Liu,et al.  GPU Accelerated Smith-Waterman , 2006, International Conference on Computational Science.

[22]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[23]  José Nelson Amaral,et al.  Using the Cowichan problems to investigate the programmability of X10 programming system , 2011, X10 '11.