Generic Algorithms for Scheduling Applications on Hybrid Multi-core Machines

We study the problem of executing an application represented by a precedence task graph on a multi-core machine composed of standard computing cores and accelerators. Contrary to most existing approaches, we distinguish the allocation and the scheduling phases and we mainly focus on the allocation part of the problem: choose the most appropriate type of computing unit for each task. We address both off-line and on-line settings. In the first case, we establish strong lower bounds on the worst-case performance of a known approach based on Linear Programming for solving the allocation problem. Then, we refine the scheduling phase and we replace the greedy list scheduling policy used in this approach by a better ordering of the tasks. Although this modification leads to the same approximability guarantees, it performs much better in practice. In the on-line case, we assume that the tasks arrive in any, not known in advance, order which respects the precedence relations and the scheduler has to take irrevocable decisions about their allocation and execution. In this setting, we propose the first online scheduling algorithm which takes into account precedences. Our algorithm is based on adequate rules for selecting the type of processor where to allocate the tasks and it achieves a constant factor approximation guarantee if the ratio of the number of CPUs over the number of GPUs is bounded. Finally, all the previous algorithms have been experimented on a large number of simulations built on actual libraries. These simulations assess the good practical behavior of the algorithms with respect to the state-of-the-art solutions whenever these exist or baseline algorithms.

[1]  Salim Hariri,et al.  Task scheduling algorithms for heterogeneous processors , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[2]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[3]  Safia Kedad-Sidhoum,et al.  Scheduling independent tasks on multi‐cores with GPU accelerators , 2015, Concurr. Comput. Pract. Exp..

[4]  Alfredo Goldman,et al.  A Simple BSP-based Model to Predict Execution Time in GPU Applications , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[5]  George Bosilca,et al.  Poster: Matrices over Runtime Systems at Exascale , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[6]  Lin Chen,et al.  Online Scheduling of mixed CPU-GPU jobs , 2014, Int. J. Found. Comput. Sci..

[7]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[8]  Fabián A. Chudak,et al.  Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds , 1997, SODA '97.

[9]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[10]  Michael A. Bender,et al.  An Efficient Approximation Algorithm for Minimizing Makespan on Uniformly Related Machines , 2001, J. Algorithms.

[11]  Jane W.-S. Liu,et al.  Performance analysis of multiprocessor systems containing functionally dedicated processors , 1978, Acta Informatica.

[12]  Safia Kedad-Sidhoum,et al.  Scheduling Tasks with Precedence Constraints on Hybrid Multi-core Machines , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[13]  Jean-Marc Vincent,et al.  Random graph generation for scheduling simulations , 2010, SimuTools.

[14]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[15]  Ola Svensson,et al.  Hardness of Precedence Constrained Scheduling on Identical Machines , 2011, SIAM J. Comput..