Evaluation of Meta-scheduler Architectures and Task Assignment Policies for High Throughput Computing

In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a local network. These clusters are assumed to be connected to each other through a global network and each cluster is managed via a local scheduler which is shared by many users. We validate our simulator by comparing the experimental and analytical results of a M/M/4 queuing system. These studies indicate that the simulator is consistent. Next, we do the comparison with a real batch system and we obtain an average error of 10.5% for the response time and 12% for the makespan. We conclude that the simulator is realistic and well describes the behaviour of a large-scale system. Thus we can study the scheduling of our system in a high throughput context. We justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context.

[1]  Abhijit Bose,et al.  MARS: a metascheduler for distributed resources in campus grids , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[2]  Ramin Yahyapour,et al.  Design and evaluation of job scheduling strategies for grid computing , 2000, GRID.

[3]  Dong Lu,et al.  Synthesizing Realistic Computational Grids , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[4]  Peter A. Dinda,et al.  Synthesizing Realistic Computational Grids , 2003, SC.

[5]  Ian Stokes-Rees,et al.  DIRAC: a scalable lightweight architecture for high throughput computing , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Sathish S. Vadhiyar,et al.  A metascheduler for the Grid , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[7]  Francine Berman,et al.  Using Effective Network Views to Promote Distributed Application Performance , 1999, PDPTA.

[8]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[9]  Miron Livny,et al.  Mechanisms for High Throughput Computing , 1997 .

[10]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[11]  Matthew Doar,et al.  A better model for generating test networks , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[12]  Stephen A. Jarvis,et al.  Optimising static workload allocation in multiclusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[14]  Francine Berman,et al.  A study of deadline scheduling for client-server systems on the Computational Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[15]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[16]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[18]  G. Kuznetsov,et al.  Results of the LHCb experiment Data Challenge 2004 , 2004 .