Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed.

Shared-memory multi-core architectures are becoming increasingly popular. While their parallelism and peak performance is ever increasing, their efficiency is often disappointing due to memory fence overheads. In this paper we present FastFlow, a programming methodology based on lock-free queues explicitly designed for programming streaming applications on multi-cores. The potential of FastFlow is evaluated on micro-benchmarks and on the Smith-Waterman sequence alignment application, which exhibits a substantial speedup against the state-of-the-art multi-threaded implementation (SWPS3 x86/SSE2).

[1]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[2]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[3]  Massimo Torquati,et al.  The Implementation of ASSIST, an Environment for Parallel and Distributed Programming , 2003, Euro-Par.

[4]  Jocelyn Serot TAGGED-TOKEN DATA-FLOW FOR SKELETONS , 2001 .

[5]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[6]  Marco Danelutto,et al.  An advanced environment supporting structured parallel programming in Java , 2003, Future Gener. Comput. Syst..

[7]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[8]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[9]  Peter Kilpatrick,et al.  Towards Hierarchical Management of Autonomic Components: A Case Study , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[10]  Sven-Bodo Scholz,et al.  Semantics and Type Theory of S-Net. , 2006 .

[11]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[14]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[15]  Sergei Gorlatch,et al.  DatTel: A Data-Parallel C++ Template Library , 2003, Parallel Process. Lett..