论文信息 - Employing a study of the robustness metrics to assess the reliability of dynamic loop scheduling ∗

Employing a study of the robustness metrics to assess the reliability of dynamic loop scheduling ∗

To achieve best performance, scientific applications are executed on parallel and distributed heterogeneous computing systems. These applications often are computationally intensive, data parallel, irregular, and usually contain large loops that exhibit non-uniform characteristics depending upon their semantic structure during execution. These loops are the most data parallel and computationally intensive part of the applications, and therefore, the main focus of this work is on loop iterations scheduling. Improper scheduling of such loop iterations may lead to load imbalance, which is the dominant factor for performance degradation. A number of dynamic loop scheduling (DLS) techniques have been developed to address the issue of load imbalance for scientific applications on dynamically changing environments. The increasing demand for faster execution of iterations in larger simulations of more complex application models require that DLS provide robust, on-demand performance, on dynamically scalable, high performance computing systems. To evaluate the robustness of these DLS techniques two robustness metrics have previously been formulated to guarantee flexibility and resilience. In this work, we describe simulations of DLS techniques using Alea, a GridSim-based scheduling simulator. Based on the simulation results, we calculate the robustness metrics and show how to use them to determine the most robust DLS techniques.

Ioana Banicescu | Srishti Srivastava | Florina M. Ciorba

[1] Ioana Banicescu,et al. Dynamic scheduling parallel loops with variable iterate execution times , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2] Dror G. Feitelson,et al. The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[3] Warren Smith,et al. Benchmarks and Standards for the Evaluation of Parallel Job Schedulers , 1999, JSSPP.

[4] Ioana Banicescu,et al. Investigating the robustness of adaptive Dynamic Loop Scheduling on heterogeneous computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5] Rajkumar Buyya,et al. GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[6] Dalibor Klusácek,et al. Alea 2: job scheduling simulator , 2010, SimuTools.

[7] I. Banicescu,et al. ADDRESSING THE STOCHASTIC NATURE OF SCIENTIFIC COMPUTATIONS VIA DYNAMIC LOOP SCHEDULING , 2005 .

[8] Ioana Banicescu,et al. Towards the Robustness of Dynamic Loop Scheduling on Large-Scale Heterogeneous Distributed Systems , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.