Distributed Checkpointing on Clusters with Dynamic Striping and Staggering
暂无分享,去创建一个
[1] Hai Jin,et al. Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing , 2002, IEEE Trans. Parallel Distributed Syst..
[2] Willy Zwaenepoel,et al. On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[3] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..
[4] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[5] Luís Moura Silva,et al. Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[6] Junguk L. Kim,et al. An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..
[7] Jian Xu,et al. Adaptive independent checkpointing for reducing rollback propagation , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.
[8] Michael Allen,et al. Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .
[9] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.
[10] Nitin H. Vaidya,et al. A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.
[11] Yong Deng,et al. Checkpointing and rollback-recovery algorithms in distributed systems , 1994, J. Syst. Softw..
[12] Jeffrey F. Naughton,et al. Low-Latency, Concurrent Checkpointing for Parallel Programs , 1994, IEEE Trans. Parallel Distributed Syst..
[13] Hai Jin,et al. Designing SSI clusters with hierarchical checkpointing and single I/O space , 1999, IEEE Concurr..
[14] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[15] Mukesh Singhal,et al. Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..
[16] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[17] David A. Patterson,et al. Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..
[18] Kishor S. Trivedi,et al. Reliability Analysis of Redundant Arrays of Inexpensive Disks , 1993, J. Parallel Distributed Comput..
[19] Hai Jin,et al. Reliable cluster computing with a new checkpointing RAID-x architecture , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).
[20] RICHARD KOO,et al. Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.
[21] Mukesh Singhal,et al. On Coordinated Checkpointing in Distributed Systems , 1998, IEEE Trans. Parallel Distributed Syst..
[22] Nitin H. Vaidya,et al. Staggered Consistent Checkpointing , 1999, IEEE Trans. Parallel Distributed Syst..