Costs and Benefits of Load Sharing in the Computational Grid

We present an analysis of the costs and benefits of load sharing of parallel jobs in the computational grid. We begin with a workload generation model that captures the essential properties of parallel jobs and use it as input to a grid simulation model. Our experiments are performed for both homogeneous and heterogeneous grids. We measured average job slowdown with respect to both local and remote jobs and we show that, with some reasonable assumptions concerning the migration policy, load sharing proves to be beneficial when the grid is homogeneous, and that load sharing can adversely affect job slowdown for lightly-loaded machines in a heterogeneous grid. With respect to the number of sites in a grid, we find that the benefits obtained by load sharing do not scale well. Small to modest-size grids can employ load sharing as effectively as large-scale grids. We also present and evaluate an effective scheduling heuristic for migrating a job within the grid.

[1]  R. Deal Simulation Modeling and Analysis (2nd Ed.) , 1994 .

[2]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[3]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[4]  R. Syski,et al.  Fundamentals of Queueing Theory , 1999, Technometrics.

[5]  Hisashi Kobayashi,et al.  Modeling and analysis , 1978 .

[6]  Terry Williams,et al.  Probability and Statistics with Reliability, Queueing and Computer Science Applications , 1983 .

[7]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[8]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[9]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[10]  Carl M. Harris,et al.  Fundamentals of queueing theory (2nd ed.). , 1985 .

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[12]  Cynthia Bailey Lee,et al.  Are User Runtime Estimates Inherently Inaccurate? , 2004, JSSPP.

[13]  Warren Smith,et al.  Predicting Application Run Times Using Historical Information , 1998, JSSPP.

[14]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[15]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[16]  P. Sadayappan,et al.  Distributed job scheduling on computational Grids using multiple simultaneous requests , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[17]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  Jeffrey K. Hollingsworth,et al.  Imprecise calendars: an approach to scheduling computational grids , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[19]  Jens Mache,et al.  A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling , 1998, JSSPP.

[20]  Srinivasan,et al.  [IEEE Comput. Soc 11th IEEE International Symposium on High Performance Distributed Computing - Edinburgh, UK (23-26 July 2002)] Proceedings 11th IEEE International Symposium on High Performance Distributed Computing - Distributed job scheduling on computational Grids using multiple simultaneous req , 2002 .