Agent-based grid load balancing using performance-driven task scheduling

Load balancing is a key concern when developing parallel and distributed computing applications. The emergence of computational grids extends this problem, where issues of cross-domain and large-scale scheduling must also be considered. In this paper an agent-based grid management infrastructure is coupled with a performance-driven task scheduler that has been developed for local grid load balancing. Each grid scheduler utilises predictive application performance data and an iterative heuristic algorithm to engineer local load balancing across multiple processing nodes. At a higher level, a hierarchy of homogeneous agents are used to represent multiple grid resources. Agents cooperate with each other to balance workload in the global grid environment using service advertisement and discovery mechanisms. A case study is included with corresponding experimental results to demonstrate that both local schedulers and agents contribute to overall grid load balancing, which significantly improves grid application execution performance and resource utilisation.

[1]  Albert Y. Zomaya,et al.  Observations on Using Genetic Algorithms for Dynamic Load-Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[2]  Kenneth A. Hawick,et al.  Resource discovery for dynamic clusters in computational grids , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[3]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[4]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[5]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[6]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[7]  Mitsuhisa Sato,et al.  Design and implementations of Ninf: towards a global computing infrastructure , 1999, Future Gener. Comput. Syst..

[8]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[9]  Graham R. Nudd,et al.  Performance evaluation of an agent-based resource management infrastructure for grid computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[10]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[12]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[13]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[14]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[15]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[16]  Subhash Saini,et al.  ARMS: An agent-based resource management system for grid computing , 2002, Sci. Program..

[17]  Ibm Redbooks,et al.  Workload Management With Loadleveler , 2001 .

[18]  Graham R. Nudd,et al.  Dynamic Application Integration Using Agent-Based Operational Administration , 2000 .

[19]  Paul Roe,et al.  Performance evaluation of an agent-based resource management infrastructure for grid computing , 2001 .

[20]  Graham R. Nudd,et al.  High Performance Service Discovery in Large-Scale Multi-Agent and Mobile-Agent Systems , 2001, Int. J. Softw. Eng. Knowl. Eng..

[21]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[22]  Graham R. Nudd,et al.  Pace—A Toolset for the Performance Prediction of Parallel and Distributed Systems , 2000, Int. J. High Perform. Comput. Appl..

[23]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[24]  Graham R. Nudd,et al.  Performance modeling of parallel and distributed computing using PACE , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).

[25]  Rajkumar Buyya,et al.  Nature's heuristics for scheduling jobs on Computational Grids , 2000 .