Self-Adaptive OmpSs Tasks in Heterogeneous Environments

As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource management. OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. This paper presents a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. From the results obtained in a multi-GPU system, we prove that our proposal gives flexibility to application's source code and can potentially increase application's performance.

[1]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[2]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[3]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[4]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[5]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[7]  Scott B. Baden,et al.  Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[8]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.

[9]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[10]  William J. Dally,et al.  Compilation for explicitly managed memory hierarchies , 2007, PPOPP.

[11]  Alejandro Duran,et al.  Extending the OpenMP Tasking Model to Allow Dependent Tasks , 2008, IWOMP.

[12]  Andrew Richards,et al.  Offload - Automating Code Migration to Heterogeneous Multicore Systems , 2010, HiPEAC.

[13]  Alejandro Duran,et al.  A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.

[14]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[15]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[16]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[17]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Cédric Augonnet,et al.  A Unified Runtime System for Heterogeneous Multi-core Architectures , 2009, Euro-Par Workshops.

[19]  Jack Dongarra,et al.  An Improved MAGMA GEMM for Fermi GPUs , 2010 .

[20]  Anand Raghunathan,et al.  MDR: performance model driven runtime for heterogeneous parallel platforms , 2011, ICS '11.