CoreTSAR: Core Task-Size Adapting Runtime

Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstrates a robust ability to adapt to both a variety of workloads and underlying system configurations.

[1]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[2]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[3]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[4]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Wu-chun Feng,et al.  Architecture-Aware Mapping and Optimization on a 1600-Core GPU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[6]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[8]  Gagan Agrawal,et al.  A dynamic scheduling framework for emerging heterogeneous systems , 2011, 2011 18th International Conference on High Performance Computing.

[9]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[10]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[11]  Eduard Ayguadé,et al.  Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.

[12]  Cédric Augonnet,et al.  StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .

[13]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[14]  Bronis R. de Supinski,et al.  Heterogeneous Task Scheduling for Accelerated OpenMP , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[15]  Wu-chun Feng,et al.  Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. , 2010, Journal of molecular graphics & modelling.

[16]  Bronis R. de Supinski,et al.  OpenMP for Accelerators , 2011, IWOMP.

[17]  Alejandro Duran,et al.  Is the Schedule Clause Really Necessary in OpenMP? , 2003, WOMPAT.

[18]  Bronis R. de Supinski,et al.  CoreTSAR: Adaptive Worksharing for Heterogeneous Systems , 2014, ISC.