Towards an opportunistic grid scheduling infrastructure based on tuple spaces

One main issue associated with the efficient and effective use of heterogeneous resources in a grid system is the scheduling. Scheduling in a grid system involves a number of challenging issues mainly due to the dynamic nature of the grid. Schedulers on traditional grid infrastructures rely on an information service that provides information about resources capacities and availability. However, in an asynchronous distributed system like a grid providing up-to-date information about resources is difficult. Current scheduling algorithms make scheduling decisions without fully accurate information about resources which can lead to inefficient schedules. This paper proposes a new scheduling infrastructure for grids where resources select tasks they execute, instead of the traditional approach where schedulers finding resources for the tasks. The new proposed approach allows, at any time, to make scheduling decisions with up-to-date/accurate information. Moreover, our infrastructure provides mechanisms to provide a fault tolerant scheduling. The proposed infrastructure is mainly based on the tuple space coordination model. In our evaluation study, a number of experiments with various simulation setting demonstrated the practicability of proposed infrastructure.

[1]  Brian Randell,et al.  Operating Systems, An Advanced Course , 1978 .

[2]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[3]  Miguel Correia,et al.  How Practical Are Intrusion-Tolerant Distributed Systems? , 2006 .

[4]  Jennifer M. Schopf,et al.  Ten Actions When Grid Scheduling , 2004 .

[5]  Antony I. T. Rowstron,et al.  Solving the Linda Multiple rd Problem Using the Copy-Collect Primitive , 1998, Sci. Comput. Program..

[6]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[7]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[8]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[9]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[10]  Dennis Shasha,et al.  PLinda 2.0: a transactional/checkpointing approach to fault tolerant Linda , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[11]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[12]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[13]  Constantine D. Polychronopoulos,et al.  Java virtual machine support for object serialization , 2001, JGI '01.

[14]  Paul Bowman,et al.  Hitting the distributed computing sweet spot with TSpaces , 2001, Comput. Networks.

[15]  J. van Leeuwen,et al.  Job Scheduling Strategies for Parallel Processing , 2003, Lecture Notes in Computer Science.

[16]  Richard D. Schlichting,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..

[17]  Ulrich Rüde,et al.  A lightweight Java taskspaces framework for scientific computing on computational grids , 2003, SAC '03.

[18]  Acm Java Grande Proceedings of the ACM 2001 Java Grande/ISCOPE Conference, Palo Alto, California, June 2-4, 2001 , 2001 .

[19]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[20]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[21]  Francisco Vilar Brasileiro,et al.  Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids , 2003, Euro-Par.

[22]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[23]  Barbara Liskov,et al.  A design for a fault-tolerant, distributed implementation of Linda , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[24]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[25]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[26]  Franco Zambonelli,et al.  Mobile-Agent Coordination Models for Internet Applications , 2000, Computer.

[27]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[28]  Darrell D. E. Long,et al.  A longitudinal survey of Internet host reliability , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[29]  Sheng-De Wang,et al.  Nature's heuristics for scheduling jobs on Computational Grids , 2000 .