Influence of Tasks Duration Variability on Task-Based Runtime Schedulers

In the context of HPC platforms, individual nodes nowadays consist of heterogenous processing resources such as GPU units and multicores. Those resources share communication and storage resources, inducing complex co-scheduling effects, and making it hard to predict the exact duration of a task or of a communication. To cope with these issues, runtime dynamic schedulers such as starpu have been developed. These systems base their decisions at runtime on the state of the platform and possibly on static priorities of tasks computed offline. In this paper, our goal is to quantify performance variability in the context of HPC heterogeneous nodes, by focusing on very regular dense linear algebra kernels, such as Cholesky and LU factorizations. We therefore first concentrate on the evaluation of the individual block-size kernels variability. Then, we analyze the impact of this variability at the scale of a full application on a dynamic runtime scheduler such as starpu, in order to analyze whether the strategies that have been designed in the context of MapReduce applications to cope with stragglers could be transferred to HPC systems, or if the dynamic nature of runtime schedulers is enough to cope with actual performance variations, even in presence of task dependencies.

[1]  Emmanuel Jeannot,et al.  Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments , 2010, IEEE Transactions on Parallel and Distributed Systems.

[2]  Xiaohui Liu,et al.  Evolutionary Multi-Objective Workflow Scheduling in Cloud , 2016, IEEE Transactions on Parallel and Distributed Systems.

[3]  Luka Stanisic,et al.  A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications. (Méthodologie de recherche reproductible adaptée à la conception et à la conduite de simulations d'applications scientifique multitâche dynamiques) , 2015 .

[4]  Juan F. Pérez,et al.  Evaluating Replication for Parallel Jobs: An Efficient Approach , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[6]  Eduard Ayguadé,et al.  Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..

[7]  Jean-François Méhaut,et al.  Faithful performance prediction of a dynamic task‐based runtime system for heterogeneous multi‐core architectures , 2015, Concurr. Comput. Pract. Exp..

[8]  Gregory W. Wornell,et al.  Using Straggler Replication to Reduce Latency in Large-scale Parallel Computing , 2015, PERV.

[9]  Jie Xu,et al.  Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters , 2019, IEEE Transactions on Services Computing.

[10]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[11]  George Bosilca,et al.  PaRSEC : A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .

[12]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[13]  Padma Raghavan,et al.  A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce , 2019, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[14]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[15]  Robert A. van de Geijn,et al.  SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.

[16]  Suraj Kumar,et al.  Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources. (Ordonnancement de noyaux d'algèbre linéaire dense sur ressources hétérogènes) , 2017 .

[17]  Francisco Vilar Brasileiro,et al.  Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids , 2003, Euro-Par.

[18]  Emmanuel Jeannot,et al.  A Comparison of robustness metrics for scheduling DAGs on heterogeneous systems , 2007, 2007 IEEE International Conference on Cluster Computing.

[19]  Hai Jin,et al.  Maestro: Replica-Aware Map Scheduling for MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[20]  Olivier Beaumont,et al.  Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Emmanuel Agullo,et al.  Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures , 2016, Euro-Par Workshops.

[22]  Emmanuel Jeannot,et al.  Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics , 2008, CoreGRID Integration Workshop.

[23]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[24]  Emmanuel Agullo,et al.  Are Static Schedules so Bad? A Case Study on Cholesky Factorization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Jack Dongarra,et al.  QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .

[27]  Magdalena Balazinska,et al.  Managing Skew in Hadoop , 2013, IEEE Data Eng. Bull..

[28]  Jérémie Allard,et al.  Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.

[29]  Anthony A. Maciejewski,et al.  Stochastic robustness metric and its use for static resource allocations , 2008, J. Parallel Distributed Comput..

[30]  Gregory W. Wornell,et al.  Efficient task replication for fast response times in parallel computation , 2014, SIGMETRICS '14.