Supporting Cost-Effective Fault Tolerance in Distributed Message-Passing Applications with File Operations
暂无分享,去创建一个
[1] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[2] Yuval Tamir,et al. Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.
[3] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1981, TOCS.
[4] Jonathan Walpole,et al. MIST: PVM with Transparent Migration and Checkpointing , 1995 .
[5] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[6] Richard D. Schlichting,et al. Fail-Stop Processors: An Approach to Designing Computing Systems , 1983 .
[7] Friedemann Mattern,et al. Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..
[8] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[9] Jeffrey F. Naughton,et al. An efficient checkpointing method for multicomputers with wormhole routing , 1991, International Journal of Parallel Programming.
[10] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.
[11] James S. Plank. Efficient checkpointing on MIMD architectures , 1993 .
[12] W. Kent Fuchs,et al. Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[13] R.E. Strom,et al. A recoverable object store , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.
[14] Kai Li,et al. ickp: a consistent checkpointer for multicomputers , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[15] Gernot Heiser,et al. Checkpointing and recovery for distributed shared memory applications , 1995, Proceedings of International Workshop on Object Orientation in Operating Systems.
[16] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.
[17] Yennun Huang,et al. Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.
[18] Micah Beck,et al. Compiler-Assisted Memory Exclusion for Fast Checkpointing , 1995 .
[19] Eric A. Brewer,et al. An Algorithm for Concurrent Search Trees , 1991, International Conference on Parallel Processing.
[20] W. Kent Fuchs,et al. Compiler‐assisted full checkpointing , 1994, Softw. Pract. Exp..
[21] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.
[22] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[23] Michael Litzkow,et al. Supporting checkpointing and process migration outside the UNIX kernel , 1999 .
[24] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[25] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[26] Vijay K. Garg,et al. How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[27] L. Alvisi,et al. Nonblocking and Orphan-Free Message Logging Protocols , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[28] Andrew S. Tanenbaum,et al. Computer Networks , 1981 .
[29] Jinsong Ouyang. Supporting cost-effective fault tolerance in distributed applications with file operations , 1997 .
[30] RICHARD KOO,et al. Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.
[31] Jack J. Dongarra,et al. Algorithm-based diskless checkpointing for fault tolerant matrix operations , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[32] Sean W. Smith,et al. Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[33] Gilbert Cabillic,et al. The performance of consistent checkpointing in distributed shared memory systems , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.
[34] Peter Steenkiste,et al. Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .
[35] D. Manivannan,et al. A low-overhead recovery technique using quasi-synchronous checkpointing , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[36] Ten-Hwang Lai,et al. On Distributed Snapshots , 1987, Inf. Process. Lett..
[37] Gernot Heiser,et al. Libra: A Library for Reliable Distributed Applications , 1996, PDPTA.
[38] Georg Stellner. Consistent Checkpoints of PVM Applications , 1994 .
[39] Jeffrey F. Naughton,et al. Low-Latency, Concurrent Checkpointing for Parallel Programs , 1994, IEEE Trans. Parallel Distributed Syst..
[40] Jeffrey F. Naughton,et al. Checkpointing multicomputer applications , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.