Accelerating sequential programs using FastFlow and self-offloading

FastFlow is a programming environment specifically targeting cache-coherent shared-memory multi-cores. FastFlow is implemented as a stack of C++ template libraries built on top of lock-free (fence-free) synchronization mechanisms. In this paper we present a further evolution of FastFlow enabling programmers to offload part of their workload on a dynamically created software accelerator running on unused CPUs. The offloaded function can be easily derived from pre-existing sequential code. We emphasize in particular the effective trade-off between human productivity and execution efficiency of the approach.

[1]  Sven-Bodo Scholz,et al.  Semantics and Type Theory of S-Net. , 2006 .

[2]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[3]  Insung Park,et al.  Parallel programming environment for OpenMP , 2001, Sci. Program..

[4]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[5]  L ScottMichael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998 .

[6]  Peter Kilpatrick,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed , 2009, PARCO.

[7]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[8]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[9]  G. S. Graham A New Solution of Dijkstra ' s Concurrent Programming Problem , 2022 .

[10]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[11]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[12]  Peter Kilpatrick,et al.  Autonomic management of non-functional concerns in distributed & parallel application programming , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Marco Danelutto,et al.  Skeleton-based parallel programming: Functional and parallel semantics in a single shot , 2007, Comput. Lang. Syst. Struct..

[14]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[15]  Massimo Torquati,et al.  Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[16]  Laxmikant V. Kalé,et al.  Towards a framework for abstracting accelerators in parallel applications: experience with cell , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Laxmikant V. Kale,et al.  Performance and Productivity in Parallel Programming via Processor Virtualization , 2004 .

[18]  Torquati Massimo,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed. , 2009 .