Online Scheduling of Task Graphs on Heterogeneous Platforms

Modern computing platforms commonly include accelerators. We target the problem of scheduling applications modeled as task graphs on hybrid platforms made of two types of resources, such as CPUs and GPUs. We consider that task graphs are uncovered dynamically, and that the scheduler has information only on the available tasks, i.e., tasks whose predecessors have all been completed. Each task can be processed by either a CPU or a GPU, and the corresponding processing times are known. Our study extends a previous <inline-formula><tex-math notation="LaTeX">$4\sqrt{m/k}$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>4</mml:mn><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:math><inline-graphic xlink:href="simon-ieq1-2942909.gif"/></alternatives></inline-formula>-competitive online algorithm by Amaris et al. <xref ref-type="bibr" rid="ref1">[1]</xref> , where <inline-formula><tex-math notation="LaTeX">$m$</tex-math><alternatives><mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="simon-ieq2-2942909.gif"/></alternatives></inline-formula> is the number of CPUs and <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="simon-ieq3-2942909.gif"/></alternatives></inline-formula> the number of GPUs (<inline-formula><tex-math notation="LaTeX">$m\geq k$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mo>≥</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="simon-ieq4-2942909.gif"/></alternatives></inline-formula>). We prove that no online algorithm can have a competitive ratio smaller than <inline-formula><tex-math notation="LaTeX">$\sqrt{m/k}$</tex-math><alternatives><mml:math><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msqrt></mml:math><inline-graphic xlink:href="simon-ieq5-2942909.gif"/></alternatives></inline-formula>. We also study how adding flexibility on task processing, such as task migration or spoliation, or increasing the knowledge of the scheduler by providing it with information on the task graph, influences the lower bound. We provide a <inline-formula><tex-math notation="LaTeX">$(2\sqrt{m/k}+1)$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msqrt><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="simon-ieq6-2942909.gif"/></alternatives></inline-formula>-competitive algorithm as well as a tunable combination of a system-oriented heuristic and a competitive algorithm; this combination performs well in practice and has a competitive ratio in <inline-formula><tex-math notation="LaTeX">$\Theta (\sqrt{m/k})$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>Θ</mml:mi><mml:mo>(</mml:mo><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>/</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="simon-ieq7-2942909.gif"/></alternatives></inline-formula>. We also adapt all our results to the case of multiple types of processors. Finally, simulations on different sets of task graphs illustrate how the instance properties impact the performance of the studied algorithms and show that our proposed tunable algorithm performs the best among the online algorithms in almost all cases and has even performance close to an offline algorithm.

[1]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[2]  Terry Cojean,et al.  Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[3]  Frédéric Vivien,et al.  Online Scheduling of Task Graphs on Hybrid Platforms , 2018, Euro-Par.

[4]  Lin Chen,et al.  Online Scheduling of mixed CPU-GPU jobs , 2014, Int. J. Found. Comput. Sci..

[5]  Denis Trystram,et al.  Generic algorithms for scheduling applications on heterogeneous multi-core platforms , 2017, ArXiv.

[6]  Safia Kedad-Sidhoum,et al.  Scheduling Independent Moldable Tasks on Multi-Cores with GPUs , 2017, IEEE Transactions on Parallel and Distributed Systems.

[7]  Cédric Augonnet,et al.  Data-Aware Task Scheduling on Multi-accelerator Based Platforms , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[8]  Dror G. Feitelson,et al.  Workload Modeling for Computer Systems Performance Evaluation , 2015 .

[9]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[10]  Fabián A. Chudak,et al.  Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds , 1997, SODA '97.

[11]  Ola Svensson,et al.  Hardness of Precedence Constrained Scheduling on Identical Machines , 2011, SIAM J. Comput..

[12]  Safia Kedad-Sidhoum,et al.  Scheduling Tasks with Precedence Constraints on Hybrid Multi-core Machines , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[13]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[14]  Safia Kedad-Sidhoum,et al.  Scheduling independent tasks on multi‐cores with GPU accelerators , 2015, Concurr. Comput. Pract. Exp..

[15]  Olivier Beaumont,et al.  Fast approximation algorithms for task‐based runtime systems , 2018, Concurr. Comput. Pract. Exp..

[16]  Csanád Imreh,et al.  Scheduling Problems on Two Sets of Identical Machines , 2003, Computing.

[17]  Frédéric Vivien,et al.  Low-Cost Approximation Algorithms for Scheduling Independent Tasks on Hybrid Platforms , 2017, Euro-Par.

[18]  Denis Trystram,et al.  Generic Algorithms for Scheduling Applications on Hybrid Multi-core Machines , 2017, Euro-Par.

[19]  Lilia Zaourar,et al.  Approximation Algorithm for Scheduling Applications on Hybrid Multi-core Machines with Communications Delays , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[20]  Jack Dongarra,et al.  Faster, Cheaper, Better { a Hybridization Methodology to Develop Linear Algebra Software for GPUs , 2010 .

[21]  Olivier Beaumont,et al.  Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[22]  Emmanuel Agullo,et al.  Are Static Schedules so Bad? A Case Study on Cholesky Factorization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[23]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[24]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .