GRIDTS: A New Approach for Fault-Tolerant Scheduling in Grid Computing

This paper proposes GRIDTS, a grid infrastructure in which the resources select the tasks they execute, on the contrary to traditional infrastructures where schedulers find resources for the tasks. This solution allows scheduling decisions to be made with up-to-date information about the resources, which is difficult in the traditional infrastructures. Moreover, GRIDTS provides fault-tolerant scheduling by combining a set of fault tolerance techniques to cope with crash faults in components of the system. The solution is mainly based a tuple space, which supports the scheduling and also provides support for the fault tolerance mechanisms.

[1]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[2]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[3]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[4]  David Gelernter,et al.  Distributed communication via global buffer , 1982, PODC '82.

[5]  Dennis Shasha,et al.  PLinda 2.0: a transactional/checkpointing approach to fault tolerant Linda , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[6]  Antony I. T. Rowstron,et al.  Solving the Linda Multiple rd Problem Using the Copy-Collect Primitive , 1998, Sci. Comput. Program..

[7]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[8]  Ulrich Rüde,et al.  A lightweight Java taskspaces framework for scientific computing on computational grids , 2003, SAC '03.

[9]  Santosh K. Shrivastava,et al.  A System for Fault-Tolerance Execution of Data and Compute Intensive Programs over a Network of Workstations , 1996, Euro-Par, Vol. I.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Franco Zambonelli,et al.  Mobile-Agent Coordination Models for Internet Applications , 2000, Computer.

[12]  Liviu Iftode,et al.  Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007), 12 - 14 July 2007, Cambridge, MA, USA , 2007, IEEE International Symposium on Network Computing and Applications.

[13]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[14]  Miguel Correia,et al.  Exploiting Tuple Spaces to Provide Fault-Tolerant Scheduling on Computational Grids , 2007, 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'07).

[15]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[16]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[17]  Tobin J. Lehman,et al.  T Spaces : The Next Wave , 2004 .

[18]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[19]  Barbara Liskov,et al.  A design for a fault-tolerant, distributed implementation of Linda , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[20]  Richard D. Schlichting,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..