论文信息 - CoreTSAR: Core Task-Size Adapting Runtime

CoreTSAR: Core Task-Size Adapting Runtime

Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstrates a robust ability to adapt to both a variety of workloads and underlying system configurations.

[1] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[2] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[3] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[4] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5] Wu-chun Feng,et al. Architecture-Aware Mapping and Optimization on a 1600-Core GPU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[6] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7] Michael Wolfe,et al. Implementing the PGI Accelerator model , 2010, GPGPU-3.

[8] Gagan Agrawal,et al. A dynamic scheduling framework for emerging heterogeneous systems , 2011, 2011 18th International Conference on High Performance Computing.

[9] R. Dolbeau,et al. HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[10] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[11] Eduard Ayguadé,et al. Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.

[12] Cédric Augonnet,et al. StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .

[13] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[14] Bronis R. de Supinski,et al. Heterogeneous Task Scheduling for Accelerated OpenMP , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[15] Wu-chun Feng,et al. Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. , 2010, Journal of molecular graphics & modelling.

[16] Bronis R. de Supinski,et al. OpenMP for Accelerators , 2011, IWOMP.

[17] Alejandro Duran,et al. Is the Schedule Clause Really Necessary in OpenMP? , 2003, WOMPAT.

[18] Bronis R. de Supinski,et al. CoreTSAR: Adaptive Worksharing for Heterogeneous Systems , 2014, ISC.