论文信息 - A parallel pattern for iterative stencil + reduce

A parallel pattern for iterative stencil + reduce

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

[1] Marco Danelutto,et al. Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[2] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[3] Herbert Kuchen,et al. Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems , 2011, PARCO.

[4] Peter Kilpatrick,et al. The Loop-of-Stencil-Reduce Paradigm , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[5] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[6] Peter Kilpatrick,et al. Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[7] Marco Danelutto,et al. ASSIST As a Research Framework for High-Performance Grid Programming Environments , 2006, Grid Computing: Software Environments and Tools.

[8] Torquati Massimo,et al. Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed. , 2009 .

[9] Murray Cole,et al. PARTANS: An autotuning framework for stencil computation on multi-GPU systems , 2013, TACO.

[10] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[11] Sergei Gorlatch,et al. Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems , 2014 .

[12] Master Gardener,et al. Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[13] Murray Cole,et al. Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[14] Sergei Gorlatch,et al. SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems , 2013, PaCT.

[15] Concetto Spampinato,et al. Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern , 2015, Int. J. High Perform. Comput. Appl..

[16] Horacio González-Vélez,et al. A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[17] Peter Kilpatrick,et al. Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed , 2009, PARCO.