Towards the Robustness of Dynamic Loop Scheduling on Large-Scale Heterogeneous Distributed Systems

Dynamic loop scheduling (DLS) algorithms provide application-level load balancing of loop iterates, with the goal of maximizing application performance on the underlying system. These methods use run-time information regarding the performance of the application's execution (for which irregularities change over time). Many DLS methods are based on probabilistic analyses, and therefore account for unpredictable variations of application and system related parameters. Scheduling scientific and engineering applications in large-scale distributed systems (possibly shared with other users) makes the problem of DLS even more challenging. Moreover, the chances of failure, such as processor or link failure, are high in such large-scale systems. In this paper, we employ the hierarchical approach for three DLS methods, and propose metrics for quantifying their robustness with respect to variations of two parameters (load and processor failures), for scheduling irregular applications in large-scale heterogeneous distributed systems.

[1]  Jeanette P. Schmidt,et al.  Load-sharing in heterogeneous systems via weighted factoring , 1996, SPAA '96.

[2]  Thomas Kunz,et al.  The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme , 1991, IEEE Trans. Software Eng..

[3]  Ioana Banicescu,et al.  On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring , 2003, Cluster Computing.

[4]  Ioana Banicescu,et al.  A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[5]  Krishna M. Kavi,et al.  Parallelization of DOALL and DOACROSS Loops - A Survey , 1997, Adv. Comput..

[6]  Ioana Banicescu,et al.  Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations , 1995, SC.

[7]  Ioana Banicescu,et al.  Load balancing highly irregular computations with the adaptive factoring , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[8]  Howard Jay Siegel,et al.  Perspectives on Robust Resource Allocation for Heterogeneous Parallel and Distributed Systems , 2007 .

[9]  Thomas Rauber,et al.  Dynamic Loop Scheduling with Processor Groups , 2004, ISCA PDCS.

[10]  Ioana Banicescu,et al.  Performance of scheduling scientific applications with adaptive weighted factoring , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[11]  Theodore Andronikos,et al.  Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments , 2006, 2006 IEEE International Conference on Cluster Computing.

[12]  Ioana Banicescu,et al.  A Dynamic Load Balancing Tool for One and Two Dimensional Parallel Loops , 2006, 2006 Fifth International Symposium on Parallel and Distributed Computing.

[13]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[14]  Filip De Turck,et al.  Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.