Slick: A Coordinated Job Allocation Technique for Inter-Grid Architectures

Large scale Grid computing systems are often organized as an inter-Grid architecture, where multiple Grid domains are interconnected through their local broker. In this context, the main challenge is to devise appropriate job scheduling policies that can satisfy goals such as global load balancing together with maintaining the local policies of the different Grids. This paper presents SLICK, a scalable resource discovery and job scheduling technique for broker based interconnected Grid domains. In this technique we leave local scheduling policies untouched, while inter-Grid scheduling decisions are handled by a separate scheduler installed on local brokers. To make suitable scheduling decisions, brokers must collect information about current resource usage at other domains. To this end, brokers periodically exchange their local domain's resource usage information with their neighbors. For large scale systems, this periodic exchange naturally leads to a significant amount of traffic. To avoid that the broker overlay becomes overloaded, we introduce an aggregation technique to reduce and combine worker resource usage information. We have compared SLICK with three other techniques through simulation of 50,000 node Grid divided into 512 domains. We used synthetic job sequences with a total load of 80,000 jobs. Our results show that SLICK is better at maintaining the overall throughput and load balancing than previous techniques.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  Bernd Schuller,et al.  Chemomentum - UNICORE 6 Based Infrastructure for Complex Applications in Science and Technology , 2007, Euro-Par Workshops.

[3]  Li Zhang,et al.  Tycoon: An implementation of a distributed, market-based resource allocation system , 2004, Multiagent Grid Syst..

[4]  Jennifer M. Schopf,et al.  A performance study of monitoring and information services for distributed systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[5]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008 .

[6]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[7]  Dietmar W. Erwin,et al.  UNICORE—a Grid computing environment , 2002, Concurr. Comput. Pract. Exp..

[8]  Andrew S. Grimshaw,et al.  A federated model for scheduling in wide-area systems , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[9]  Zhou Lei,et al.  The portable batch scheduler and the maui scheduler on linux clusters , 2000 .

[10]  Roger Menday The Web Services Architecture and the UNICORE Gateway , 2006, Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services (AICT-ICIW'06).

[11]  X. Evers Condor Flocking: Load Sharing between Pools of Workstations , 1993 .

[12]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[13]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008, Concurr. Comput. Pract. Exp..

[14]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[15]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[16]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.