Use-Cases and Requirements for Grid Checkpoint and Recovery

This document describes use-cases to be addressed by the Grid Checkpoint and Recovery Working Group (GridCPR WG). The scenarios are also used to determine a set of requirements for these standards.

[1]  Nathan Stone A Checkpoint and Recovery System for the Pittsburgh Supercomputing Center Terascale Computing System , 2001 .

[2]  John Shalf,et al.  The Cactus Framework and Toolkit: Design and Applications , 2002, VECPAR.

[3]  Heon Young Yeom,et al.  Design and Implementation of Dynamic Process Management for Grid-Enabled MPICH , 2003, PVM/MPI.

[4]  Daniel Marques,et al.  Collective Operations in an Application-level Fault Tolerant MPI System , 2003 .

[5]  Eduardo Huedo,et al.  A framework for adaptive execution in grids , 2004, Softw. Pract. Exp..

[6]  Micah Beck,et al.  The Internet Backplane Protocol: Storage in the Network , 1999 .

[7]  Dennis Gannon,et al.  Checkpoint and restart for distributed components in XCAT3 , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[8]  Thilo Kielmann,et al.  A Day in the Life of a Grid-Enabled Application: Counting on the Grid , 2004 .

[9]  Sathish S. Vadhiyar,et al.  SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems , 2003, Parallel Process. Lett..

[10]  Eduardo Huedo,et al.  Evaluating the reliability of computational grids from the end user's point of view , 2006, J. Syst. Archit..

[11]  P. Coveney,et al.  Steering in computational science: Mesoscale modelling and simulation , 2003, physics/0307061.

[12]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.

[13]  Jarek Nabrzyski,et al.  GridLab--a grid application toolkit and testbed , 2002, Future Gener. Comput. Syst..

[14]  Christine Morin,et al.  Checkpointing and recovery of shared memory parallel applications in a cluster , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[15]  John Brooke,et al.  Computational steering in realitygrid , 2003 .

[16]  Rob van Nieuwpoort,et al.  The Grid Application Toolkit: Toward Generic and Easy Application Programming Interfaces for the Grid , 2005, Proceedings of the IEEE.