Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems
暂无分享,去创建一个
Jose Renato Santos | Yoshio Turner | G. John Janakiraman | Dinesh Subhraveti | J. R. Santos | Y. Turner | Dinesh Subhraveti | G. Janakiraman
[1] Peter Alan Lee,et al. Fault Tolerance , 1990, Dependable Computing and Fault-Tolerant Systems.
[2] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[3] Willy Zwaenepoel,et al. On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[4] Pankaj Jalote,et al. Fault tolerance in distributed systems , 1994 .
[5] W. Richard Stevens,et al. TCP/IP Illustrated, Volume 1: The Protocols , 1994 .
[6] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[7] Jack Dongarra,et al. PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .
[8] Jonathan Walpole,et al. MPVM: A Migration Transparent Version of PVM , 1995, Comput. Syst..
[9] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[10] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[11] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[12] Nuno Neves,et al. RENEW: a tool for fast and efficient implementation of checkpoint protocols , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[13] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[14] Jason Nieh,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation , 2022 .
[15] Ian T. Foster,et al. Grid Services for Distributed System Integration , 2002, Computer.
[16] Jason Duell,et al. Requirements for Linux Checkpoint/Restart , 2002 .
[17] Jeffrey C. Mogul,et al. Unveiling the transport , 2004, CCRV.
[18] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[19] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..