A Survey of Rollback-Recovery Protocols
暂无分享,去创建一个
[1] C. V. Ramamoorthy,et al. Rollback and Recovery Strategies for Computer Programs , 1972, IEEE Transactions on Computers.
[2] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[3] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.
[4] Butler W. Lampson,et al. Crash Recovery in a Distributed Data Storage System , 1981 .
[5] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[6] Erol Gelenbe,et al. Performance of rollback recovery systems under intermittent failures , 1978, CACM.
[7] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.
[8] BabaogluÖzalp,et al. Converting a swap-based system to do paging in an architecture lacking page-referenced bits , 1981 .
[9] J. A. McDermid. Checkpointing and Error Recovery in distributed Systems , 1981, ICDCS.
[10] K. H. Kim,et al. Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.
[11] David L. Presotto,et al. Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.
[12] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.
[13] Krishna Kant. A model for error recovery with global checkpointing , 1983, Inf. Sci..
[14] Andrzej Duda,et al. The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..
[15] Kang G. Shin,et al. Optimization criteria for checkpoint placement , 1984, CACM.
[16] Leslie Lamport,et al. Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.
[17] Yuval Tamir,et al. ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .
[18] Augusto Ciuffoletti,et al. A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.
[19] W. G. Wood. Recovery Control of Communicating Processes in a Distributed System , 1985 .
[20] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.
[21] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[22] Madalene Spezialetti,et al. Efficient Distributed Snapshots , 1986, ICDCS.
[23] A Antola,et al. Backward error recovery in distributed systems , 1986 .
[24] Thomas A. Cargill,et al. Cheap hardware support for software debugging and profiling , 1987, ASPLOS.
[25] Eli Gafni,et al. A Software-Based Hardware Fault Tolerance Scheme for Multicomputers , 1987, ICPP.
[26] Ten-Hwang Lai,et al. On Distributed Snapshots , 1987, Inf. Process. Lett..
[27] Stuart I. Feldman,et al. IGOR: a system for program debugging via reversible execution , 1988, PADD '88.
[28] Parameswaran Ramanathan,et al. Checkpointing and rollback recovery in a distributed system using common time base , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.
[29] Randy Pausch,et al. Adding input and output to the transactional model , 1988 .
[30] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[31] K. H. Kim,et al. Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation , 1988, IEEE Trans. Software Eng..
[32] Jacques Malenfant,et al. Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems , 1988, IEEE Trans. Computers.
[33] Vaidy S. Sunderam,et al. Process Migration in UNIX Networks , 1988, USENIX Winter.
[34] R.E. Strom,et al. A recoverable object store , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.
[35] Michel Banâtre,et al. Ensuring data security and integrity with a fast stable storage , 1988, Proceedings. Fourth International Conference on Data Engineering.
[36] Jonathan M. Smith,et al. Implementing remote fork() with checkpoint/restart , 1989 .
[37] Wolfgang Graetsch,et al. Fault tolerance under UNIX , 1989, TOCS.
[38] Taesoon Park,et al. Checkpointing and rollback-recovery in distributed systems , 1989 .
[39] Thomas J. LeBlanc,et al. A software instruction counter , 1989, ASPLOS III.
[40] John C. Knight,et al. On the provision of backward error recovery in production programming languages , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[41] M. E. Staknis. Sheaved memory: architectural support for state saving and restoration in pages systems , 1989, ASPLOS 1989.
[42] Jean-Michel Hélary. Observing Global States of Asynchronous Distributed Applications , 1989, WDAG.
[43] Luke Lin,et al. Using checkpoints to localize the effects of faults in distributed systems , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.
[44] A. Prasad Sistla,et al. Efficient distributed recovery using message logging , 1989, PODC '89.
[45] Richard D. Schlichting,et al. Preserving and using context information in interprocess communication , 1989, TOCS.
[46] D. Morris,et al. A non-intrusive checkpointing protocol , 1989, Eighth Annual International Phoenix Conference on Computers and Communications. 1989 Conference Proceedings.
[47] Yuval Tamir,et al. Application-transparent process-level error recovery for multicomputers , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.
[48] Jason Gait. A Checkpointing Page Store for Write-Once Optical Disk , 1990, IEEE Trans. Computers.
[49] K.H. Kim,et al. A highly decentralized implementation model for the programmer-transparent coordination (PTC) scheme for cooperative recovery , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[50] Jonathan Walpole,et al. Recovery with limited replay: fault-tolerant processes in Linda , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.
[51] Willy Zwaenepoel,et al. Output-Driven Distributed Optimistic Message Logging and Checkpointing , 1990 .
[52] Rong Chen,et al. Building a Fault-Tolerant System Based on Mach , 1990, USENIX MACH Symposium.
[53] Jacob A. Abraham,et al. Forward Recovery Using Checkpointing in Parallel Systems , 1990, ICPP.
[54] Arthur P. Goldberg. Transparent Recovery of Mach Applications , 1990, USENIX MACH Symposium.
[55] Kun-Lung Wu,et al. Recoverable Distributed Shared Virtual Memory , 1990, IEEE Trans. Computers.
[56] Zbigniew M. Wójcik,et al. Fault tolerant distributed computing using atomic send-receive checkpoints , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.
[57] Bharat K. Bhargava,et al. Experimental evaluation of concurrent checkpointing and rollback-recovery algorithms , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.
[58] Andrew W. Appel,et al. A runtime system , 1990, LISP Symb. Comput..
[59] Meichun Hsu,et al. Fast recovery in distributed shared virtual memory systems , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.
[60] David B. Johnson,et al. Distributed system fault tolerance using message logging and checkpointing , 1990 .
[61] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.
[62] David B. Johnson,et al. Transparent optimistic rollback recovery , 1991, OPSR.
[63] S. Venkatesan,et al. Crash recovery with little overhead , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.
[64] James R. Russell,et al. Optimistic failure recovery for very large networks , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.
[65] Flaviu Cristian,et al. A timestamp-based checkpointing protocol for long-lived distributed computations , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.
[66] Rumen Stainov. An asynchronous checkpointing service , 1991 .
[67] Andrea Clematis,et al. Process checkpointin primitives for fault tolerance: definitions and examples , 1992, Microprocess. Microsystems.
[68] Amitabh Sinha,et al. Checkpointing and recovery in a pipeline of transputers , 1992, Microprocess. Microprogramming.
[69] Barton P. Miller,et al. Optimal tracing and replay for debugging message-passing parallel programs , 1992, Supercomputing '92.
[70] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[71] Jacob A. Abraham,et al. Implementing Forward Recovery Using Checkpoints in Distributed Systems , 1992 .
[72] Henri E. Bal,et al. Transparent fault-tolerance in parallel Orca programs , 1992 .
[73] Richard Y. Kain,et al. Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks , 1992, IEEE Trans. Parallel Distributed Syst..
[74] W. Kent Fuchs,et al. Scheduling message processing for reducing rollback propagation , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[75] Jacob A. Abraham,et al. Compiler-assisted static checkpoint insertion , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[76] C. R. Landau. The checkpoint mechanism in KeyKOS , 1992, [1992] Proceedings of the Second International Workshop on Object Orientation in Operating Systems.
[77] Michel Ruffin,et al. KITLOG: a Generic Logging Service , 1992, SRDS.
[78] Jiannong Cao,et al. An abstract model of rollback recovery control in distributed systems , 1992, OPSR.
[79] Dhiraj K. Pradhan,et al. Virtual Checkpoints: Architecture and Performance , 1992, IEEE Trans. Computers.
[80] B. R. Badrinath,et al. Recording Distributed Snapshots Based on Causal Order of Message Delivery , 1992, Inf. Process. Lett..
[81] Luís Moura Silva,et al. Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[82] W. Kent Fuchs,et al. Optimistic message logging for independent checkpointing in message-passing systems , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[83] Johan Vounckx,et al. Survey of Backward Error Recovery Techniques for Multicomputers Based on Checkpointing and Rollback , 1993 .
[84] Yi-Min Wang,et al. Space reclamation for uncoordinated checkpointing in message-passing systems , 1993 .
[85] Mark Russinovich,et al. Application transparent fault management in fault tolerant Mach , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[86] Lorenzo Alvisi,et al. Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[87] Phil Kearns,et al. Rollback based on vector time , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[88] W. Kent Fuchs,et al. Progressive retry for software error recovery in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[89] Junguk L. Kim,et al. An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..
[90] Dhiraj K. Pradhan,et al. Processor- and memory-based checkpoint and rollback recovery , 1993, Computer.
[91] Sachin Garg,et al. Improving the Speed of A Distributed Checkpointing Algorithm , 1993 .
[92] James S. Plank. Efficient checkpointing on MIMD architectures , 1993 .
[93] W. Kent Fuchs,et al. Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[94] Jian Xu,et al. Adaptive message logging for incremental program replay , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.
[95] David B. Johnson,et al. Efficient transparent optimistic rollback recovery for distributed application programs , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[96] Yennun Huang,et al. Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.
[97] W. Kent Fuchs,et al. Relaxing consistency in recoverable distributed shared memory , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[98] Bojan Groselj,et al. Bounded and minimum global snapshots , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.
[99] Jian Xu,et al. Adaptive independent checkpointing for reducing rollback propagation , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.
[100] Mukesh Singhal,et al. Using logging and asynchronous checkpointing to implement recoverable distributed shared memory , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.
[101] Tzi-cker Chiueh. Polar: A Storage Architecture for Fast Checkpointing , 1993, J. Inf. Sci. Eng..
[102] Parameswaran Ramanathan,et al. Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System , 1993, IEEE Trans. Software Eng..
[103] Gilles Muller,et al. Performance of Consistent Checkpointing in a Modular Operating System: Results of the FTM Experiment , 1994, EDCC.
[104] Victor F. Nicola,et al. Checkpointing and the modeling of program execution time , 1994 .
[105] Robert H. B. Netzer,et al. Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.
[106] Kai Li,et al. Faster checkpointing with N+1 parity , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[107] Erik Seligman,et al. High-Level Fault Tolerance in Distributed Programs , 1994 .
[108] J. Bruck,et al. Efficient checkpointing over local area networks , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.
[109] David Cummings,et al. Checkpoint/rollback in a distributed system using coarse-grained dataflow , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[110] W. Kent Fuchs,et al. Reducing interprocessor dependence in recoverable distributed shared memory , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.
[111] Dhiraj K. Pradhan,et al. An efficient coordinated checkpointing scheme for multicomputers , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.
[112] Dhiraj K. Pradhan,et al. Recovery in Multicomputers with Finite Error Detection Latency , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[113] Miguel Castro,et al. A checkpoint protocol for an entry consistent shared memory system , 1994, PODC '94.
[114] Yuval Tamir,et al. Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.
[115] Willy Zwaenepoel,et al. Manetho: fault tolerance in distributed systems using rollback-recovery and process replication , 1994 .
[116] Dennis Shasha,et al. PLinda 2.0: a transactional/checkpointing approach to fault tolerant Linda , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.
[117] Yi-Min Wang,et al. Optimal message log reclamation for uncoordinated checkpointing , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.
[118] B. R. Badrinath,et al. Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.
[119] Georg Stellner,et al. Consistent Checkpoints of PVM Applications , 1994 .
[120] João Gabriel Silva,et al. On the optimum recovery of distributed programs , 1994, Proceedings of Twentieth Euromicro Conference. System Architecture and Integration.
[121] Andrea Clematis. Fault tolerant programming for network based parallel computing , 1994, Microprocess. Microprogramming.
[122] M. Moura Silva,et al. Checkpointing SPMD applications on transputer networks , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[123] W. Kent Fuchs,et al. Consistent Global Checkpoints Based on Direct Dependency Tracking , 1994, Inf. Process. Lett..
[124] Yi-Min Wang,et al. Why optimistic message logging has not been used in telecommunications systems , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[125] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..
[126] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[127] Manhoi Choy,et al. On distributed object checkpointing and recovery , 1995, PODC '95.
[128] Yi-Min Wang,et al. Maximum and minimum consistent global checkpoints and their applications , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.
[129] Lorenzo Alvisi,et al. Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.
[130] W. Kent Fuchs,et al. Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems , 1995, IEEE Trans. Parallel Distributed Syst..
[131] Christine Morin,et al. A Survey of Recoverable Distributed Shared Memory Systems , 1995 .
[132] B. Randell,et al. STATE RESTORATION IN DISTRIBUTED SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[133] Gilbert Cabillic,et al. The performance of consistent checkpointing in distributed shared memory systems , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.
[134] Yennun Huang,et al. An implementation and performance measurement of the progressive retry technique , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.
[135] W. Kent Fuchs,et al. Tight Upper Bound on Useful Distributed System Checkpoints , 1995 .
[136] Micah Beck,et al. Compiler-Assisted Memory Exclusion for Fast Checkpointing , 1995 .
[137] Luís Moura Silva,et al. Portable checkpointing and recovery , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.
[138] Jack J. Dongarra,et al. Algorithm-based diskless checkpointing for fault tolerant matrix operations , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[139] Jian Xu,et al. Sender-based message logging for reducing rollback propagation , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[140] Jonathan Walpole,et al. MIST: PVM with Transparent Migration and Checkpointing , 1995 .
[141] W. Kent Fuchs,et al. Reduced overhead logging for rollback recovery in distributed shared memory , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[142] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[143] Anne-Marie Kermarrec,et al. A recoverable distributed shared memory integrating coherence and recoverability , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[144] James S. Plank,et al. Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.
[145] Jack Dongarra,et al. Fault tolerant matrix operations using checksum and reverse computation , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).
[146] Lorenzo Alvisi. Understanding the message logging paradigm for masking process crashes , 1996 .
[147] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[148] Ge-Ming Chiu,et al. Efficient Rollback-Recovery Technique in Distributed Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..
[149] Luís Moura Silva,et al. Portable transparent checkpointing for distributed shared memory , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.
[150] Lorenzo Alvisi,et al. Trade-offs in implementing causal message logging protocols , 1996, PODC '96.
[151] Fred B. Schneider,et al. Hypervisor-based fault tolerance , 1996, TOCS.
[152] Jennifer L. Welch,et al. Implementation of recoverable distributed shared memory by logging writes , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[153] Michel Banâtre,et al. Lessons from FTM: An Experiment in Design and Implementation of a Low-Cost Fault-Tolerant System , 1996, IEEE Trans. Reliab..
[154] Mark A. Franklin,et al. Checkpointing in Distributed Computing Systems , 1996, J. Parallel Distributed Comput..
[155] Vijay K. Garg,et al. How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.
[156] Mark Russinovich,et al. Replay for concurrent non-deterministic shared-memory applications , 1996, PLDI '96.
[157] Makoto Takizawa,et al. Distributed checkpointing based on influential messages , 1996, Proceedings of 1996 International Conference on Parallel and Distributed Systems.
[158] Mukesh Singhal,et al. Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..
[159] Sean W. Smith,et al. Minimizing timestamp size for completely asynchronous optimistic recovery with minimal rollback , 1995, Proceedings 15th Symposium on Reliable Distributed Systems.
[160] Nuno Neves,et al. Using time to improve the performance of coordinated checkpointing , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.
[161] Achour Mostéfaoui,et al. Efficient Message Logging for Uncoordinated Checkpointing Protocols , 1996, EDCC.
[162] E. N. Elnozahy,et al. Supporting nondeterministic execution in fault-tolerant systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[163] Tzi-cker Chiueh,et al. Evaluation of checkpoint mechanisms for massively parallel machines , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[164] Nitin H. Vaidya. On staggered checkpointing , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.
[165] Jehoshua Bruck,et al. An on-line algorithm for checkpoint placement , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.
[166] Yi-Min Wang,et al. Integrating checkpointing with transaction processing , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[167] Roll-Forward and Rollback Recovery: Performance-Reliability Trade-Off , 1997, IEEE Trans. Computers.
[168] Chong-Sun Hwang,et al. Hybrid checkpointing protocol based on selective-sender-based message logging , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.
[169] W. Kent Fuchs,et al. Progressive Retry for Software Failure Recovery in Message-Passing Applications , 1997, IEEE Trans. Computers.
[170] B. Ramkumar,et al. Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[171] Jack J. Dongarra,et al. Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing , 1997, J. Parallel Distributed Comput..
[172] Nuno Neves,et al. Adaptive recovery for mobile environments , 1997, CACM.
[173] Ravishankar K. Iyer,et al. An object-oriented testbed for the evaluation of checkpointing and recovery systems , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[174] Jack Dongarra,et al. Fault tolerant matrix operations for networks of workstations using multiple checkpointing , 1997, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97.
[175] D. Manivannan,et al. Finding Consistent Global Checkpoints in a Distributed Computation , 1997, IEEE Trans. Parallel Distributed Syst..
[176] Rudy Lauwereins,et al. User-triggered checkpointing: system-independent and scalable application recovery , 1997, Proceedings Second IEEE Symposium on Computer and Communications.
[177] Robert H. B. Netzer,et al. Replaying distributed programs without message logging , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).
[178] Makoto Takizawa,et al. Object-based checkpoints in distributed systems , 1997, Proceedings Third International Workshop on Object-Oriented Real-Time Dependable Systems.
[179] Makoto Takizawa,et al. Checkpoint and rollback in asynchronous distributed systems , 1997, Proceedings of INFOCOM '97.
[180] Erik Seligman,et al. Application Level Fault Tolerance in Heterogenous Networks of Workstations , 1997, J. Parallel Distributed Comput..
[181] Jong Kim,et al. Probabilistic checkpointing , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.
[182] Achour Mostéfaoui,et al. Preventing useless checkpoints in distributed computations , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.
[183] Achour Mostéfaoui,et al. Virtual Precedence in Asynchronous Systems: Cencept and Applications , 1997, WDAG.
[184] James S. Plank,et al. Experimental assessment of workstation failures and their impact on checkpointing systems , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[185] Nuno Neves,et al. RENEW: a tool for fast and efficient implementation of checkpoint protocols , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[186] L. Alvisi,et al. Message Logging: Pessimistic, Optimistic, Causal, and Optimal , 1998, IEEE Trans. Software Eng..
[187] Harrick M. Vin,et al. The cost of recovery in message logging protocols , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[188] Bharat K. Bhargava,et al. Design and analysis of a hardware-assisted checkpointing and recovery scheme for distributed applications , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[189] Sampath Rangarajan,et al. Checkpoints-on-demand with active replication , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[190] Luís Moura Silva,et al. An experimental study about diskless checkpointing , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).
[191] Mukesh Singhal,et al. On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).
[192] W. Kent Fuchs,et al. PREACHES-portable recovery and checkpointing in heterogeneous systems , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[193] Bruno Ciciani,et al. A VP-accordant checkpointing protocol preventing useless checkpoints , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[194] Luís Moura Silva,et al. System-level versus user-defined checkpointing , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[195] Adel Said Elmaghraby,et al. An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation , 1998, IEEE Trans. Parallel Distributed Syst..
[196] Xiaohui Wei,et al. SFT: a consistent checkpointing algorithm with shorter freezing time , 1998, OPSR.
[197] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[198] Harrick M. Vin,et al. Low-overhead protocols for fault-tolerant file sharing , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).
[199] Franco Zambonelli. Distributed checkpoint algorithms to avoid roll-back propagation , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).
[200] E. N. Elnozahy,et al. Support for Software Interrupts in Log-Based Rollback-Recovery , 1998, IEEE Trans. Computers.
[201] Vijay K. Garg,et al. A non-blocking recovery algorithm for causal message logging , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[202] E. N. Elnozahy. How safe is probabilistic checkpointing? , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[203] Mukesh Singhal,et al. Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).
[204] Luís Moura Silva,et al. Avoiding checkpoint contamination in parallel systems , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).
[205] Nuno Neves,et al. Coordinated checkpointing without direct coordination , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).
[206] William R. Dieter,et al. A user-level checkpointing library for POSIX threads programs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[207] Lorenzo Alvisi,et al. An analysis of communication induced checkpointing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[208] Kai Li,et al. Memory Exclusion: Optimizing the Performance of Checkpointing Systems , 1999, Softw. Pract. Exp..
[209] Achour Mostéfaoui,et al. Communication-Induced Determination of Consistent Snapshots , 1999, IEEE Trans. Parallel Distributed Syst..
[210] Harrick M. Vin,et al. Egida: an extensible toolkit for low-overhead fault-tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[211] Heon Young Yeom,et al. An asynchronous recovery scheme based on optimistic message logging for mobile computing systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.
[212] Friedemann Mattern,et al. Virtual Time and Global States of Distributed Systems , 2002 .