Employing a study of the robustness metrics to assess the reliability of dynamic loop scheduling ∗

To achieve best performance, scientific applications are executed on parallel and distributed heterogeneous computing systems. These applications often are computationally intensive, data parallel, irregular, and usually contain large loops that exhibit non-uniform characteristics depending upon their semantic structure during execution. These loops are the most data parallel and computationally intensive part of the applications, and therefore, the main focus of this work is on loop iterations scheduling. Improper scheduling of such loop iterations may lead to load imbalance, which is the dominant factor for performance degradation. A number of dynamic loop scheduling (DLS) techniques have been developed to address the issue of load imbalance for scientific applications on dynamically changing environments. The increasing demand for faster execution of iterations in larger simulations of more complex application models require that DLS provide robust, on-demand performance, on dynamically scalable, high performance computing systems. To evaluate the robustness of these DLS techniques two robustness metrics have previously been formulated to guarantee flexibility and resilience. In this work, we describe simulations of DLS techniques using Alea, a GridSim-based scheduling simulator. Based on the simulation results, we calculate the robustness metrics and show how to use them to determine the most robust DLS techniques.