A parallel pattern for iterative stencil + reduce

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

[1]  Marco Danelutto,et al.  Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[2]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[3]  Herbert Kuchen,et al.  Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems , 2011, PARCO.

[4]  Peter Kilpatrick,et al.  The Loop-of-Stencil-Reduce Paradigm , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[5]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[6]  Peter Kilpatrick,et al.  Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[7]  Marco Danelutto,et al.  ASSIST As a Research Framework for High-Performance Grid Programming Environments , 2006, Grid Computing: Software Environments and Tools.

[8]  Torquati Massimo,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed. , 2009 .

[9]  Murray Cole,et al.  PARTANS: An autotuning framework for stencil computation on multi-GPU systems , 2013, TACO.

[10]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[11]  Sergei Gorlatch,et al.  Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems , 2014 .

[12]  Master Gardener,et al.  Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[13]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[14]  Sergei Gorlatch,et al.  SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems , 2013, PaCT.

[15]  Concetto Spampinato,et al.  Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern , 2015, Int. J. High Perform. Comput. Appl..

[16]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[17]  Peter Kilpatrick,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed , 2009, PARCO.