Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed

Shared-memory multi-core architectures are becoming increasingly popular. While their parallelism and peak performance is ever increasing, their efficiency is often disappointing due to memory fence overheads. In this paper we present FastFlow, a programming methodology based on lock-free queues explicitly designed for programming streaming applications on multi-cores. The potential of FastFlow is evaluated on micro-benchmarks and on the Smith-Waterman sequence alignment application, which exhibits a substantial speedup against the state-of-the-art multi-threaded implementation (SWPS3 x86/SSE2).

[1]  Jocelyn Sérot,et al.  Tagged-Token Data-Flow for Skeletons , 2001, Parallel Process. Lett..

[2]  Sven-Bodo Scholz,et al.  Semantics and Type Theory of S-Net. , 2006 .

[3]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[4]  Massimo Torquati,et al.  The Implementation of ASSIST, an Environment for Parallel and Distributed Programming , 2003, Euro-Par.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[7]  Marco Danelutto,et al.  An advanced environment supporting structured parallel programming in Java , 2003, Future Gener. Comput. Syst..

[8]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[9]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[10]  Peter Kilpatrick,et al.  Towards Hierarchical Management of Autonomic Components: A Case Study , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[11]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[12]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[13]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[14]  Herbert Kuchen,et al.  Scalable Farms , 2005, PARCO.

[15]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[16]  Sergei Gorlatch,et al.  DatTel: A Data-Parallel C++ Template Library , 2003, Parallel Process. Lett..

[17]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.