SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers’ memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.

[1]  Peter Kilpatrick,et al.  Targeting Distributed Systems in FastFlow , 2012, Euro-Par Workshops.

[2]  Thomas Rauber,et al.  Optimizing locality and scalability of embedded Runge-Kutta solvers using block-based pipelining , 2006, J. Parallel Distributed Comput..

[3]  Sergei Gorlatch,et al.  Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems , 2013, International Journal of Parallel Programming.

[4]  Christoph W. Kessler,et al.  Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems , 2011, IWMSE '11.

[5]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[6]  Christoph W. Kessler,et al.  Portable Parallelization of the EDGE CFD Application for GPU-based Systems using the SkePU Skeleton Programming Library , 2015, PARCO.

[7]  Kiminori Matsuzaki,et al.  Implementing Fusion-Equipped Parallel Skeletons by Expression Templates , 2009, IFL.

[8]  Cédric Augonnet,et al.  StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.

[9]  Sergei Gorlatch,et al.  SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10]  Dimitrios Soudris,et al.  Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePU , 2020, SCOPES.

[11]  Christoph W. Kessler,et al.  SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems , 2018, International Journal of Parallel Programming.

[12]  Herbert Kuchen,et al.  Musket: a domain-specific language for high-level parallel programming with algorithmic skeletons , 2019, SAC.

[13]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[14]  Marco Danelutto,et al.  Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[15]  Sara Bonella,et al.  MetalWalls: A classical molecular dynamics software dedicated to the simulation of electrochemical systems , 2020, J. Open Source Softw..

[16]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[17]  Herbert Kuchen,et al.  Algorithmic skeletons for multi-core, multi-GPU systems and clusters , 2012, Int. J. High Perform. Comput. Netw..

[18]  August Ernstsson Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous Systems , 2020 .

[19]  Christoph W. Kessler,et al.  Smart Containers and Skeleton Programming for GPU-Based Systems , 2015, International Journal of Parallel Programming.

[20]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[21]  Christoph W. Kessler,et al.  Extending smart containers for data locality‐aware skeleton programming , 2019, Concurr. Comput. Pract. Exp..

[22]  Christoph W. Kessler,et al.  Multi-Variant User Functions for Platform-Aware Skeleton Programming , 2019, PARCO.

[23]  José Daniel García Sánchez,et al.  A generic parallel pattern interface for stream and data processing , 2017, Concurr. Comput. Pract. Exp..