Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability, portability and even performance portability, but up to now application written using SkePU could only run on a single multi-GPU node. We present the extension of SkePU for GPU clusters without the need to modify the SkePU application source code. With our prototype implementation, we performed two experiments. The first experiment demonstrates the scalability with regular algorithms for N-body simulation and electric field calculation over multiple GPU nodes. The results for the second experiment show the benefit of lazy memory copying in terms of speedup gained for one level of Strassen’s algorithm and another synthetic matrix sum application.

[1]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[2]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[3]  D. N. Ranasinghe,et al.  Accelerating high performance applications with CUDA and MPI , 2009, 2009 International Conference on Industrial and Information Systems (ICIIS).

[4]  Christoph W. Kessler,et al.  Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems , 2011, IWMSE '11.

[5]  Herbert Kuchen,et al.  Algorithmic skeletons for multi-core, multi-GPU systems and clusters , 2012, Int. J. High Perform. Comput. Netw..

[6]  Christoph W. Kessler,et al.  BlockLib: a skeleton library for cell broadband engine , 2008, IWMSE '08.

[7]  Frédéric Loulergue Proceedings of the fourth international workshop on High-level parallel programming and applications , 2010, ICFP 2010.

[8]  Murray Cole A “skeletal” approach to the exploitation of parallelism , 1989 .

[9]  Sergei Gorlatch,et al.  SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[11]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[12]  Katherine Yelick,et al.  Data Structures for Irregular Applications , 1993 .