Egida: an extensible toolkit for low-overhead fault-tolerance
暂无分享,去创建一个
[1] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.
[2] A. Prasad Sistla,et al. Efficient distributed recovery using message logging , 1989, PODC '89.
[3] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[4] Yi-Min Wang,et al. COMERA: COM Extensible Remoting Architecture , 1998, COOTS.
[5] Lorenzo Alvisi,et al. An analysis of communication induced checkpointing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[6] Yennun Huang,et al. Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.
[7] David B. Johnson,et al. Sender-Based Message Logging , 1987 .
[8] Ewing L. Lusk,et al. Monitors, Messages, and Clusters: The p4 Parallel Programming System , 1994, Parallel Comput..
[9] Danny Dolev,et al. The Transis approach to high availability cluster communication , 1996, CACM.
[10] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[11] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[12] S. Venkatesan,et al. Crash recovery with little overhead , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.
[13] Vijay K. Garg,et al. How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[14] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[15] L. Alvisi,et al. Nonblocking and Orphan-Free Message Logging Protocols , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[16] Roy H. Campbell,et al. Quarterware for middleware , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).
[17] Larry L. Peterson,et al. The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..
[18] P BirmanKenneth,et al. Reliable communication in the presence of failures , 1987 .
[19] Robbert van Renesse,et al. Design and Performance of Horus: A Lightweight Group Communications System , 1994 .
[20] LamportLeslie. Time, clocks, and the ordering of events in a distributed system , 1978 .
[21] Kenneth P. Birman,et al. Reliable communication in the presence of failures , 1987, TOCS.
[22] Harrick M. Vin,et al. Hybrid Message Logging Protocols for Fast Recovery , 1998 .
[23] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[24] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[25] Jonathan Walpole,et al. MIST: PVM with Transparent Migration and Checkpointing , 1995 .
[26] William Gropp,et al. User's Guide for mpich, a Portable Implementation of MPI Version 1.2.2 , 1996 .
[27] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.
[28] Anita Borg,et al. A message system supporting fault tolerance , 1983, SOSP '83.
[29] Carl Kesselman,et al. Generalized communicators in the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.
[30] Harrick M. Vin,et al. The Cost of Recovery in Message Logging Protocols , 2000, IEEE Trans. Knowl. Data Eng..
[31] Nuno Neves,et al. RENEW: a tool for fast and efficient implementation of checkpoint protocols , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[32] Douglas C. Schmidt,et al. ADAPTIVE: A dynamically assembled protocol transformation, integration and evaluation environment , 1993, Concurr. Pract. Exp..
[33] Matti A. Hiltunen,et al. A Configurable Membership Service , 1998, IEEE Trans. Computers.
[34] Peter Steenkiste,et al. Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .
[35] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.
[36] Ravishankar K. Iyer,et al. An object-oriented testbed for the evaluation of checkpointing and recovery systems , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[37] David L. Presotto,et al. Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.